Question: I need a diacritic on an “i”. Should I use the dotless “i” that I found in Unicode or what should I do? I also need to have a diacritic that will go on the upper case “i” and I can't find different heights for the diacritics.

Answer: This is where Unicode is really, really useful. You no longer need to encode two different versions of an “i” and two different versions of a diacritic. In fact, you should not! If you look at the character properties for the character you have suggested (U+0131 ) you will see that this character is only used for Turkish and Azerbaijani.

So, you should just use the base character plus the diacritic. (This makes data analysis much simpler as well.) Unicode, along with smart fonts, will automatically handle the dot removal for the “i” and height adjustment for the upper case “i”. The following examples would be encoded as U+0069 + U+030C and U+0049 + U+030C.



In the next example you can see that the diacritic is shifted down if you have characters that have descenders. The combining mark codepoint does not change when the base character has a descender. The following examples would be encoded as U+0065 + U+032D and U+0067 + U+032D.