This is the second in a series of posts on the  Unicode Bidirectional Algorithm. The first gave a brief description of the situations in which the algorithm is needed.

When text from multiple languages in different directions is mixed together, the concept of the top level direction becomes important - this is the direction of the overall flow of text in the paragraph. So, for instance, a paragraph of Hebrew text that includes some English would be rendered differently than an English paragraph that includes Hebrew words, even if they contained the same characters. To begin with, you would expect the Hebrew paragraph to be right-aligned and the English to be left-aligned, but the differences go beyond just that.

Language segment ordering

For the next example, let's pretend that the upper-case characters are Hebrew and the lower-case are English (since most of us - including me! - don't know Hebrew or Arabic. This is a common convention you'll see in documentation on bidi issues).

HERE IS A SENTENCE CONTAINING HEBREW TEXT. it is followed by some english. AND THEN THERE IS MORE HEBREW. the final sentence is more english.

In both cases there are four segments which alternate between Hebrew and English. When the paragraph direction is left-to-right (i.e., English is the primary language of the document), the overall flow of the various segments is also left-to-right, which places the initial Hebrew segment at the left side of the paragraph.

English paragraph containing some Hebrew



But when the paragraph is flowing from right to left (where Hebrew is the primary language), the initial Hebrew appears at the right side of the paragraph.

Hebrew paragraph containing some English



Punctuation

You'll also notice that the punctuation at the end of the sentences is not necessarily in the place you might expect it. Where the primary language is English, the punctuation at the end of the Hebrew sentence is written to the right of the Hebrew. In other words, its directionality is left-to-right. Similarly, where the primary language is Hebrew, the English punctuation is written to the left of the English; its directionality is right-to-left.

The direction of punctuation that occurs at the "edges" of directional segments takes its directionality from the overall paragraph direction.



Punctuation has "weak" directionality, meaning that its direction is influenced by the surrounding text. But in these cases the punctuation occurs between segments that are in different directions, so to break the tie, the direction of the punctuation is taken from the direction of the paragraph as a whole.

If this isn't what you want, there are ways to override the direction for any character you choose. This will be the subject of a future post.