Posted by Sharon Correll on 2014-04-03 10:29:40
One of the responses I received to my previous post in this series asked why parentheses are assigned a "neutral" directionality code. The answer lies in one of the special complications of the bidirectional algorithm: the way that it interacts with other processes that happen during the course of rendering a piece of text.
Consider the following sentence:
THE QUICK (BROWN) FOX JUMPS OVER THE (LAZY) DOG.
Here we are using the convention, common in discussions of bidirectional text, that upper-case represents right-to-left text. So a simple right-to-left rendering of this text would be:
.GOD )YZAL( EHT REVO SPMUJ XOF )NWORB( KCIUQ EHT
Immediately you'll see something quite odd! The appearance of the parentheses is appropriate for left-to-right text but the opposite of what you would expect for right-to-left text.
The first thing you might be tempted to do is to just use the opposite character in your data - in other words use U+0029 for the opening parenthesis and U+0028 for the closing one. But that is not recommended, and it might not even be possible. For instance, if your text string is being constructed from various sources (e.g., fields in a database) you wouldn't necessarily know which kind of parenthesis is needed until the very last minute.
If the title and author are taken from a database, there is no way for the larger page context to know which parenthesis shape should be used.
Most importantly, though, a key principle of Unicode is that characters should be used consistently according to the "meanings" assigned to them by the Unicode Standard.
(Now if you are savvy about Unicode, you might be aware that the name of the LEFT PARENTHESIS, which would seem to justify using it as a closing parenthesis in right-to-left text! The name is really incorrect, however: the semantics of the character indicate that it should only be used as an opening parenthesis.)character is in fact
So okay, we'll be well-behaved citizens of Unicode Land and make sure to use U+0028 consistently for the opening parenthesis and U+0029 for the closing parenthesis. But this means that something needs to be responsible for changing the shape of the parenthesis where appropriate. And that "something" can't do its work until after the bidi algorithm has run, because only then is the directionality of all the characters known for sure.
This "something" is a process called mirroring, which replaces the shape of the character with the one appropriate for the directional flow of the text.
A tricky thing about mirroring is that applications are not consistent with regard to when the bidi algorithm is run. Sometimes the application itself does very low-level rendering, including running the bidi algorithm. An application like XeTeX, for instance, which aims to achieve a very high quality of typography, breaks the text to be rendered into very small units and runs the bidi algorithm over them. This sort of application might want to handle the mirroring itself, or even give the user control over it.
But most applications expect the rendering to take care of both the bidi algorithm and mirroring. So both processes are most often handled by the smart-font rendering software. Notice, though, that if the application performs mirroring itself, the font rendering must not do mirroring, or it will have the effect of turning the shapes back to what they were originally!
Another complexity is that there are characters, particularly mathematical symbols, that need to be mirrored but do not have a matching mirrored character. An example of this is the square root sign, which changes its orientation in right-to-left text.
Square root signs in English and Arabic
However, there is no such thing as a "closing square root sign" to borrow the shape from! For characters like these, the only option is for the replacement glyph to be provided by the smart-font rendering module once it has figured out what the directionality of the symbol is.
OpenType and mirroring
Most applications use OpenType to perform smart rendering. Fortunately in recent years OpenType has standardized its approach to mirroring. It performs mirroring for pairs of characters in a canonical list (the OpenType Mirroring Pairs List), based on its knowledge of the pairs. For other characters that need mirroring but are not in this list, it marks them with the 'rtlm' feature which will cause the font - if it is implemented correctly - to substitute the mirrored form.
Graphite and mirroring
Graphite is an alternate smart-font rendering technology specifically designed to meet the needs of lesser-known (and less standardized) languages. The current Graphite engine, Graphite2, will perform both the bidi algorithm and mirroring, according to the specifications of the font (the GDL language provides mechanisms to override the default directionality and mirroring properties of characters). However, it is possible for applications (like possibly XeTeX) to send a flag to to the Graphite engine indicating that it has already performed mirroring so that Graphite will not do it again - which would cancel out the effect of the first mirroring!
The original Graphite engine does not perform any mirroring.
And in case you're wondering...
...what in the world the bidi algorithm is, here are the previous posts in this series: