On compounds as ligatures

As a programmer, I spend a lot of time reading text in monospace fonts. And when I say a lot, I mean a lot1. Consoles and terminals use monospace fonts. Interactive developer environments [IDEs, the programs programmers use to write programs] use monospace fonts. And as a programmer who recently has taken to writing a lot of programs involving Shavian, I feel quite strongly that the Shavian I wright in my programs should, in fact, be legible. As you might correctly surmise from this post so-far: it is not.

Before I go on, let’s make sure we are on the same page. What’s a monospace font, exactly, anyway? Well, it’s a typeface designed in such a way that all letters take up exactly the same amount of space, so they fit neatly into a nice regular grid if you write lines of letters above each other. This is exactly the way the letters worked on mechanical typewriters: the mechanism of a paper bail advances by the same distance for each letter typed, so the designers of the typeface would have to ensure that each letter was exactly as wide as each other one. The monospace fonts of the mechanical typewriters lived on in the guise of teletype systems and printers, and on the front end in the form of monochrome displays with columns of equal width characters. And they still live on to this day, in the shape of terminal emulators and aforementioned IDEs [IDEs].

With that out of the way, let me proceed with bearing my grievances. What’s so bad about Shavian in monospace fonts I hear you ask? Well… at least on macOs, the compound letters are too wide to fit into the space allotted to them, and the result looks terrible. Here, judge it for yourself:

Monospace rendering in Terminal.app on MacOS.

Now, i’m sure that there are monospace fonts out there that don’t suffer from this crowding quite as badly, but, with all due respect to their designers, they are actually still pretty dreadful. Trying to squeeze Shavian’s compound letters into the same amount of space as the ado vowel is simply the wrong solution to the problem.

The problem has been solved, though, in a very elegant way, by Kingsley read himself. Instead of trying to cram ·𐑸, ·𐑹, ·𐑻, ·𐑺, ·𐑽 or even ·𐑿 into the width of a single letter, he split them each into their two constituents. The user of the Imperial good companion typewriter would not have any keys for the compound letters, and would just type them out separately: 𐑩+𐑮, 𐑭+𐑮, 𐑻︀+𐑮, 𐑺︀+𐑮, 𐑾+𐑮 and 𐑘+𐑵.

Being able to separate the letters this way was designed into the alphabet from the ground up. The compound letters are ligatures, composed from the existing simple letters.2 Read knew very well that writing them separately would not actually compromise legibility in any way. As any moderately experienced shavian reader will know, it is not hard to get used to reading them. And as I and others have discovered, composing the compounds in that fashion is in fact a very natural way to type them out in the first place.

The Imperial good companion typewriter

With all this in mind, I did a fun little experiment yesterday. I cloned the Inter Alia repository, and changed the font. I added ligature rules for forming the compounds from their parts, and I am happy to report that this was very easy to do, it worked beautifully—in Inter Alia at least.

My master plan was going to be: find a way of encoding the ligatures such that:

  • Forming the ligatures would be opt-in: I didn’t want to inadvertently form them across syllable boundaries3, and thus break or misinterpret the existing corpus of Shavian writing
  • Viewing the ligatures written this way would degrade gracefully in fonts not [yet] supporting them, particularly in monospace fonts.

Basically, I was wanting to find a way to revise modern Shavian typography, in such a way that I could have two character ligatures in plain text without breaking everything in the process. Sadly this didn’t work as well as I would have liked—it seems the two requirements above are at odds with each other.

Here’s what I tried: satisfying the opt-in bit is easy, as Unicode has a code-point especially designed for this purpose, namely U+200D, better known as ZWJ, or Zero width joiner. I changed Inter-Alia to form ligatures if patterns like ·𐑭 + <ZWJ> +·𐑮 were encountered. I adapted one of my keyboards to emit this, and it worked as intended on the first attempt. Nice.

The second requirement is where my experiment fell flat. Sure, in rich text environments, the degradation was as planned: “𐑩‌𐑮 𐑷‌𐑮 𐑭‌𐑮” etc. But I was not so lucky in plain text contexts. The editors [IDEs and other plain-text based tools such as Vim] helpfully displayed my zero width character as “<200D>” – or worse4. Okay, okay, that’s the editors. What about other plain text use? Also sucky, i’m sad to report. MacOS’s terminal rather uselessly renders the ZWJ as a regular space, which arguably made the text even less legible than it was using the squished compounds.

I wouldn’t call this a failed experiment, however.. I learnt a hell of a lot, and I have other ideas…5

What if I dropped the first requirement, and just always render the ligatures unless explicitly told not to? As “affix rule” exceptions are rather rare, this mightn’t be the worst approach‽ the opting out could be any one of:

  • Don’t use a font that supports ligatures, or
  • Turn off ligature support at the style-sheet level [CSS], or
  • Insert ZWJs counterpart, ZWNJ [Zero width non-joiner, U+200C], or
  • Use a new typographic convention to distinguish affixes

That last one might just well b my favourite: we could write infra-red instead of infrared6. Think about it: in the New world order I am proposing, we’d be quite used to seeing ou-er ligatures written sep-e-rately. In that world, the problem solved by the affix rule is re-introduced into the orthography: no way of telling whether the author meant “infrared” or “infra-red”. In my mind, introducing the hyphen solves this rather elegantly.7

As ever, I am very curious to hear peoples thoughts on these matters! I intend to carry on experimenting on this, let me know if you’d like to give my forked version of Inter-Alia a go.

Extract Flom Shaw-script volume one

Footnotes

  1. But not alot. ↩︎
  2. I suspect that they may have been tacked on in the late stages of development to satisfy some of the odder requirements of Shaw and the executor of his will, and in my mind, five of the compound letters only serve the purpose of appeasing rhotic speakers. The sixth, ·𐑿, was of course intended to be a compromise in the opposite direction, in the hope that yod-droppers would take to using it to help non-yod-dropper’s. Now, that one never really took off, did it? ↩︎
  3. See: the affix rule in Readlex’s spelling principles. ↩︎
  4. Actually a good thing: the last thing you want as a developer is some obscure invisible Unicode character breaking the build. And in hindsight I should have foreseen this one! ↩︎
  5. One of which I’ll elaborate on here so as not to distract from the main narrative: one other angle of attack, is to tell the monospace contexts to render the compounds in two slots instead of one. There is a precedent [and a mechanism] for this, in the form of asian typography, for which the East asian width property was added to Unicode for this exact same problem wen rendering certain east asian characters. I’d just have to persuade the ·UTC to change the width for U+10478-U+1047F to two. How hard could that be? ↩︎
  6. Note that one common class for affix rules is a non-issue: 𐑾 is not under consideration to be split into constituents. So “𐑣𐑨­𐑐𐑦­𐑩𐑮” is in no danger of being misconstrued as “𐑣𐑨𐑐𐑽”. ↩︎
  7. Ok, okay, we’re not home dry yet. The elephant in the room here is that oeuvre and air are not in Unicode. So we just need to get them added how hard could that be? ↩︎

Posted

in

,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *