ScriptSource

Script

LatinLatn

Subject areas for this script

19

1

Blog posts in this subject area

These are posts from the blogs on this site; the full blogs can be accessed under the Topics link. Scroll down to see the blog posts on this page, or click on the title to see full details.

Title
Accent collision lists

0

Discussions in this subject area

Discussions include ideas, opinions or questions that invite comments from other ScriptSource users.

There are no discussions for this subject.

19
  • Some languages (especially in the Americas) require the bar to be through the bowl of a character (or top bowl in the case of "g") rather than through the stem. The Unicode Consortium does not consider these characters eligible for encoding, they consider these to be glyph variants.

    ContributorLorna Evans
  • The Czech and Slovak languages use an apostrophe rather than a caron for U+010F  LATIN SMALL LETTER D WITH CARON, U+013D  LATIN CAPITAL LETTER L WITH CARON, U+013F  LATIN CAPITAL LETTER L WITH MIDDLE DOT and U+0165  LATIN SMALL LETTER T WITH CARON (see top row). This may also be the preferred form for other languages as well and is what is used in the Unicode Standard. However, for the IPA and for other languages using the caron above the "d", "L", "l" and "t" it has been quite confusing. For those languages it is best to put the caron above the character (as in the bottom row).

    ContributorLorna Evans
  • Chinantec uses tone marks from the  Modifier Tone Letters block as well as from the  Spacing Modifier Letters block. The characters pulled from the Spacing Modifier Letters block require glyph variants.

    In the sample below, the first 4 characters are from the Spacing Modifier Letters block. The top row demonstrates what they look like in standard fonts. The second row demonstrates the variants as required by Chinantec.

    The proposal for these characters can be found here:  Revised Proposal to Encode Chinantec Tone Marks

    ContributorLorna Evans
  • When an "i" or "j" have a combining mark placed above the character the dot is removed. However, the following characters do not follow this model.

    Unicode specifies that, unlike i or j, etc., these characters do not lose their dots:

    • U+0133  LATIN SMALL LIGATURE IJ
    • U+01C8  LATIN CAPITAL LETTER L WITH SMALL LETTER J
    • U+01C9  LATIN SMALL LETTER LJ
    • U+01CB  LATIN CAPITAL LETTER N WITH SMALL LETTER J
    • U+01CC  LATIN SMALL LETTER NJ
    ContributorLorna Evans
  • The "open o" is used in the International Phonetic Alphabet (IPA) as well as by many languages. The glyph the Unicode Consortium (and IPA) use is on the left and the glyph used by many African orthographies is on the right:



    U+2183  ROMAN NUMERAL REVERSED ONE HUNDRED was added to the Roman numeral block. U+2184  LATIN SMALL LETTER REVERSED C was added for use as a Claudian letter. We do not recommend their use for anything other than what they were designed for. Please use U+0186 and U+0254 if you need an open o. Then find a font which has the serif where you want it.

    ContributorLorna Evans
  • U+0181 is used by many languages. The glyph the Unicode Consortium and most languages use is on the left and the glyph used by a few Liberian orthographies is on the right:

    ContributorLorna Priest
  • U+019D is used by many languages. The glyph the Unicode Consortium uses is on the left and the glyph used by many African orthographies is on the right:

    ContributorLorna Priest
  • U+01B3 is widely used in Africa. It is also used in the Americas. The glyph on the left (with left hook) is the glyph the Unicode Consortium originally used in the Unicode code charts. However, since Unicode 5.0 the Unicode charts have used the glyph on the right (with right hook). See the proposal for this change:  Correcting reference glyph for U+01B3 LATIN CAPITAL LETTER Y WITH HOOK. The most common usage is the one on the right with a right hook.

    ContributorLorna Priest
  • The standard glyph for U+0306  COMBINING BREVE, as used in Latin, is shown on the left. However, in Cyrillic, this character has a different appearance, as can be seen in the second glyph, and the characters in parentheses which also use the breve.

    ContributorLorna Evans
  • The glyph on the left is similar to the glyph the Unicode Consortium uses in the Unicode code charts. It is the style used in Greek fonts. The glyph on the right is the style generally preferred for IPA usage. The significant difference is in the serif on the bottom of the stem.

    ContributorLorna Evans
  • U+2C64  LATIN CAPITAL LETTER R WITH TAIL is (or was) used orthographically in both the Heiban and Moro languages. Early use had the glyph on the right. However, more recent use is with the glyph on the left. The glyph on the left is also the glyph used in the  Unicode Code Charts.

    ContributorLorna Evans
  • The "v with hook" is used in the International Phonetic Alphabet (IPA) as well as by many languages. The glyph the Unicode Consortium, the IPA and many other languages use is on the left and the glyph used by only a few African orthographies is on the right:

    ContributorLorna Priest
  • There is a set of characters (U+1EA0..U+1EFF) that are traditionally thought of as "Vietnamese". Fonts are usually designed to include the Vietnamese style diacritics (where the highest diacritic is offset to create a tighter character). However, if another language uses the same diacritics, such as an acute and a circumflex, they can have the same character set, and in general they will not want to end up with the Vietnamese style of glyphs. For African languages especially, the blue style of glyphs are what would be desired rather than the black style (Vietnamese).

    ContributorLorna Evans
  • Romanian used to use a glyph variant with a comma below (rather than cedilla) for the following characters: U+015E  LATIN CAPITAL LETTER S WITH CEDILLA, U+015F  LATIN SMALL LETTER S WITH CEDILLA, U+0162  LATIN CAPITAL LETTER T WITH CEDILLA, and U+0163  LATIN SMALL LETTER T WITH CEDILLA.



    It is now recommended that Romanian data use U+0218  LATIN CAPITAL LETTER S WITH COMMA BELOW, U+0219  LATIN SMALL LETTER S WITH COMMA BELOW, U+021A  LATIN CAPITAL LETTER T WITH COMMA BELOW, and U+021B  LATIN SMALL LETTER T WITH COMMA BELOW which do not require glyph variants.

    ContributorLorna Evans
  • In IPA usage and also when a language uses both U+0061  LATIN SMALL LETTER A and U+0251  LATIN SMALL LETTER ALPHA or both U+0069  LATIN SMALL LETTER I and U+0269  LATIN SMALL LETTER IOTA it is important to design the italic in such a way so that there is no confusion between the characters. One solution for that could be to slant the "a" or "i" characters rather than design them in a true italic style. In the example below the USV is shown in the first row, the regular style characters are shown in the 2nd row, standard italic is shown in the 3rd row (where they are easily confused) and the 4th row shows "a" and "i" in a slant italic style (which helps differentiate between the characters).

    Although all the "a" and "i" characters are not demonstrated here, they should all be slanted (such as U+00E0  LATIN SMALL LETTER A WITH GRAVE, U+00E1  LATIN SMALL LETTER A WITH ACUTE, U+00E2  LATIN SMALL LETTER A WITH CIRCUMFLEX, U+00E3  LATIN SMALL LETTER A WITH TILDE, U+00E4  LATIN SMALL LETTER A WITH DIAERESIS, etc.).

    ContributorLorna Evans
  • U+019D/U+0272 are used in a few orthographies around the world. There are several styles for the uppercase which could lead to confusion as to whether the different styles are actually different characters. The Unicode Consortium considers them to be glyph variants. This entry does not include information on which variant a language uses.

    Two variants of uppercase N with left-stem hook

    ContributorLorna Priest
  • A few languages use the capital eng which is based on the shape of the lowercase eng. However, this version does not have a descender, the hook is on the baseline. With the possible exceptions of Lamnso' [lns] and Tedaga [tuq], this appears to be an older style with most languages moving toward using the shape with a descender.

    Large form of small Eng (no descender)

    ContributorLorna Priest
  • The majority of languages in Africa and Papua New Guinea (where the eng is used in their orthography) use the capital eng which is based on the shape of the lowercase eng. The most common style is where the eng has a descender. Earlier usage tended to be on the baseline.

    Large form of small Eng (with descender)

    ContributorLorna Priest
  • Some languages use the capital eng which is based on the capital N with a hook descender on the right leg.

    N-style eng

    ContributorLorna Priest

1
  • Posted by Bob Hallissy on 2012-09-13 05:27:00

    A while back Ray Larabie started an interesting thread on Typophile regarding  accent collision lists, i.e. situations where an accent might collide with a neighboring character and thus represent a candidate for some font kerning. This problem is, of course, language- and script-specific, but this particular thread has a lot of useful information on problem cases in Latin script.

    If anyone knows of similar references or discussions for other scripts, feel free to add a comment.

0

Copyright © 2017 SIL International and released under the  Creative Commons Attribution-ShareAlike 3.0 license (CC-BY-SA) unless noted otherwise. Language data includes information from the  Ethnologue. Script information partially from the  ISO 15924 Registration Authority. Some character data from  The Unicode Standard Character Database and locale data from the  Common Locale Data Repository. Used by permission.