Posted by Lorna Priest on 2012-04-12 04:20:00
Someone came into my office the other day wanting some information on when a set of characters was added to Unicode. I turned to ScriptSource to find what he needed, and I thought this might be something others would be interested in.
One way to find this kind of information is in the DerivedAge.txt file that is part of the Unicode Character Database, but that's kind of a pain to find when you need it.
ScriptSource to the rescue!
In ScriptSource you can search for a character. Let's say I want to know about bar u/U. That is U+0289/U+0244.
I can search for 0289. When I click on the character, I will see what version it was accepted into Unicode...1.1. You can see there's quite a bit of information on the character page. There is a link to the Unicode block where that character is found. I can see it's a lowercase letter. If it is used in IPA, there is an entry for that where you can learn more about the character from an IPA standpoint (just click on the Barred U link). There's also a link to the Phonetic Symbol Guide page.
Another place to find useful information is by Script.
Click on "SCRIPTS" at the top of the site.
Click on "Cyrillic" (or whatever script you are interested in)
Search for "Unicode Status" (all scripts should have a "Unicode Status" page)
Click on "Unicode Status"
At the top of the resulting page you'll first see a link to the Unicode documentation for the script. In this case, Cyrillic is discussed in Chapter 7.
Next you'll see the various blocks for that script. For Cyrillic there are four blocks. You can see when a block was added to Unicode. (That doesn't actually mean that all the characters in that block were added, maybe just a few characters were encoded initially.) It also includes a link to the Unicode codecharts (under "Documentation") for that block.
The second table includes all characters that were encoded after the script was first encoded. This table is sorted by USV (Unicode Scalar Value). It will show you what version a particular character was added. Also, if the Unicode proposal is available online, we've linked to the Unicode proposal for a character in case you would find it helpful to know why a character was added to Unicode. For example, U+04F6 CYRILLIC CAPITAL LETTER GHE WITH DESCENDER was added to Unicode 4.1 and the Unicode proposal can be found here: n2560.pdf. From the proposal, I learn that the character was added for the Siberian Yupik language.
I find myself going to ScriptSource more and more for this kind of information.