Earlier I wrote about Everyday Unicode. ScriptSource pulls in a lot of Unicode information for Scripts and Characters and displays it in a useful way.

Scripts — ScriptSource gives a good overview of each script. This doesn't necessarily include just Unicode information; it gives a much broader view. However, for each script there is an entry called "Unicode Status". If you click on that link, you will see a lot of information on how the script is supported in Unicode. A good example to look at is the Arabic script Unicode Status page.

Unicode Character Blocks — In ScriptSource you can see a list of all the Unicode Character Blocks here. Unicode Blocks are merely a way of organizing a group of characters. In a simplistic world you might expect to find all the characters you need in one particular script block — in fact, for some scripts, that is true. Take a look at the Vai block here.

Let's go back to the Unicode Character Blocks page. If you click on the Name heading the list of blocks will be sorted by name. That is useful if you know what script you are looking for. You can see that there are three blocks that start with "Cyrillic". If you click on USV Range you can sort by the Unicode Scalar Value. That is useful if you know which range you are looking for.

The problem is that at the time a script is encoded in Unicode we may not know everything there is to know about that script. Latin, Cyrillic, Arabic and other scripts were implemented as information became available. Thus, for Latin you will see that there are at least 6 blocks related to Latin. Unfortunately, that still doesn't give you all the information you need for Latin; there are other blocks associated with Latin such as IPA Extensions.

Unicode Characters — ScriptSource pulls in all the  Unicode Character Database property information. If you want to know information about a particular character, you can go to the character page and find out valuable property information. Here is a screenshot of the character information for U+014A  LATIN CAPITAL LETTER ENG:



You will see what the character looks like, the Unicode block it is in, when the character was accepted into Unicode, what the lower-case character is, etc. You can even see what Script the character is associated with.

If you go to that character page (http://scriptsource.org/char/U00014A) you will also see other bits of information. Click on the "Forms & Behavior" tab and you'll see examples of variant glyphs for the Uppercase eng. Click on the "Use & History" tab and you will see which languages use the U+014A character. Each tab offers another level of information on the character.

When a language group is attempting to settle on an orthography, it is sometimes important to look at the orthography of a related language group. ScriptSource has included lists of the main characters used for a large number of languages (this information comes from individual contributions as well as from the  Unicode Common Locale Data Repository). For example, a Bantu group may want to see the main characters used in Swahili. They could go here (Swahili / Symbols & Characters tab) to view that information. The Unicode codepoint is listed as well as a display of what the character looks like.

Scripts...again — Since the Unicode blocks do not always provide the script association information, we need to look elsewhere. As we mentioned above, that information is available and can be seen in the Character properties. We can also go to a particular script, such as Cyrillic, and click on the "Symbols & Characters" tab (this page can take a while to load: there are a lot of characters!). There we will see a list of Characters associated with this script. Once you are familiar with Unicode Ranges you can clearly see that Cyrillic is in different ranges, U+0400..U+0527, U+1D2B, U+1D78, U+2DE0..U+2DFF and U+A640..U+A69F.

This information is extremely helpful when you want to see what characters are needed for a particular script.

Unfortunately, the picture is not yet complete! Many scripts, such as Latin or Cyrillic, use a common set of characters such as punctuation and numbers. In Unicode these are given a "Common" property. In ScriptSource we do not yet have an easy way to view characters set to Common. This link lets you view the first 1,000 characters in Unicode which have a "Common" property. When designing a font for a particular script it is also important to include characters set to "Common". Fortunately, we do offer a Recommended set of characters for Non-Roman fonts to help font designers.

Fonts & Keyboards — On a very practical level, ScriptSource offers links to fonts and keyboards which support many of the scripts encoded in Unicode. This should enable creating documents in the Unicode standard encoding which will be readable in the future, long past when custom-encoded documents can be read.