Recently my coworkers and I have been working on a font intended for use in West Africa. We wanted to support the Warsh (Warš) orthography (or reading tradition) for the Arabic script. It was somewhat of a challenge to figure out what exactly that would mean. There is not much written on Warsh. Is it character selection, alternate rendering or perhaps style of characters? It seems that all of these are involved in a Warsh orthography.

Different or Same Characters?

In the chart below we see how Warsh behavior contrasts with two other orthographic traditions - Hafs and Al-Duri. Hafs is the most common orthographic tradition for Modern Standard Arabic. Al-Duri is used in Sudan, Central Africa and some regions of Nigeria.

Warsh, Hafs, and Al-Duri letters compared

For the Hafs and Al-Duri orthographies it is a simple matter of choosing a different character. For example, the Hafs "feh" requires a nukta dot above and the Al-Duri "feh" requires a nukta dot below, but the four forms are consistent. The issue becomes more complex with Warsh. For some characters (kaf, imala e and "ii" final) it is a simple matter of using another codepoint. In the Warsh orthography the isolate and final "feh", "qaf", and "noon" must be dotless. This is not currently supported in Unicode. The question is, should it be? Should someone using a Warsh orthography be able to use the same codepoint for isolate, initial, medial and final, or should they be required to change the codepoint depending on where in the word the character appears? In the chart above, we have used different codepoints for "feh", "qaf" and "noon" isolate and final positions than for the initial and medial positions.

The current Unicode status for U+06BA  ARABIC LETTER NOON GHUNNA is very confusing. A Unicode document discussing U+06BA is located here:  L2/12-381. Currently, many of the Windows fonts support the behavior we are looking for with the Warsh noon.

However, in response to L2/12-381, the Unicode Technical Committee decided to "Document in the standard and the nameslist that Noon Ghunna is dotless in all its contextual forms." (See  UTC minutes). This apparently means the fonts supporting the Warsh noon behavior are no longer in Unicode compliance. There do not appear to be any fonts which support the Warsh (dotless) forms for feh and qaf.

There seem to be three options for supporting Warsh orthographies:

  • Use different codepoints depending on where in the word the character appears
    • Pro: Already able to use in Unicode
    • Con: Searching becomes difficult. A search for any "feh" would require searching for both U+06A2 and U+06A1.
    • Con: This seems contrary to the whole spirit of Unicode. However, it is the current solution many are compelled to use due to a lack of other options.
  • Use one codepoint for "feh" (U+06A2  ARABIC LETTER FEH WITH DOT MOVED BELOW), another for "qaf" (U+06A7  ARABIC LETTER QAF WITH DOT ABOVE) and another for "noon" (U+0646  ARABIC LETTER NOON) and use  character variants to support the isolate and final dotless forms
    • Pro: No need to wait for adding characters to Unicode
    • Pro: Searching for one character is much easier
    • Con: Very few fonts and applications support character variants
  • Add three new characters to Unicode to support the Warsh character requirements
    • Pro: Searching for one character is much easier
    • Con: It would be years before the characters are in Unicode and supported by fonts and applications

It is interesting to note that the "imala e" (U+065C) is sometimes referred to as a "warsh dot".


Another aspect of Warsh is that the hamza on U+0623  ARABIC LETTER ALEF WITH HAMZA ABOVE and U+0625  ARABIC LETTER ALEF WITH HAMZA BELOW actually touch the alef.

final alef+hamza above touching

final alef+hamza below touching

isolate alef+hamza touching (above and below examples)

lam+hamza above touching+fatha+alef+kasra

These samples come from a Tijaniyya Qur'an (Warsh-style)

Also, in isolate (but apparently not final), U+064E  ARABIC FATHA and U+0650  ARABIC KASRA on an alef also touch the alef. It would be useful to get verification on this behavior.

isolate alef+kasra below touching

isolate alef+fatha (but not alef+fathatan!) touching

These samples come from a Tijaniyya Qur'an (Warsh-style)

This kind of behavior can be programmed into the font.

Shadda+kasra positioning

U+0650  ARABIC KASRA normally appears below the consonant. However, when there is a U+0651  ARABIC SHADDA plus a kasra, the kasra normally moves above the consonant (and below the shadda).

sample word using shadda+kasra in Scheherazade font

There are a number of languages which do not use this behavior, the kasra remains below the consonant even in the context of a shadda. The Warsh orthography follows this alternate behavior.

kasra remains below the consonant even in the context of a shadda

This kind of behavior can be programmed into the font.

Are there other aspects of a Warsh orthography?

I'm sure there are other aspects of what it means to support a Warsh orthography in a font. I would be interested in hearing what they are and whether my analysis is correct. SIL is currently working on a font which we hope supports Warsh orthographies. Feel free to  download the Harmattan alpha font and give us feedback!