In understanding technologies for working with multilingual and multi-script text data, we need to start with an understanding of character encoding. Systems for working with text involve a collection of processes that work together—processes for creating and editing text, presenting it, for sorting, for laying out paragraphs and wrapping at line breaks, etc. Character encoding is the thing that ties all of these processes together.
This is a chapter from the Non-Roman Script Initiative's book Implementing Writing Systems that discusses some basic concepts that relate to both Unicode and legacy character encodings.
|Link||Character set encoding basics|