On hyphenation

It wasn’t until the other week that I began to notice it. My shavian e-books just weren’t looking as good as my roman alphabet books were, and my transliterated newspaper articles didn’t look as smooth. At first I thought it had to be poor kerning. I tried refining the kerning further1, and even doing little tricks like contextual substitutions, but it still didn’t fix it. Then it dawned on me what I, what we as a community, had been missing all along: hyphenation.

We are so used to it in printed matter: automatic insertion of hyphens so that the right-hand margin of paragraphs look more regular. Like many other typographic conventions, you don’t really notice it unless its done wrong.

It will not surprise you at all to learn that I wrote a tool for hyphenating Shavian. This was a particularly quick and easy job with Claude code, that we (well, mainly Claude) did in parallel with my tinkering on Bernie sans. The resulting tool is called shyphenate, and it is available on Github.

It is a delightfully simple Javascript program, which can either be run as a stand-alone command-line tool, or as a hook in any webpage2. In inserts soft-hyphens at all syllable bounderes: the Unicode character U+00AD, which is usually rendered as a zero-width space, except when the browser or word-processor decides to insert a line break at that point. HTML Also has an explicitly named entity for the soft-hyphen: &Shy; this is in part where the name of the tool comes from.

So why don’t we see soft hyphens all over the place, when editing plain text files etc? Well, this is because browsers and word-processors have built-in dictionary-based hyphenation algorithms, so authors don’t have to worry about it. And of course that is why we don’t have shavian hyphenation in Word or Google docs: they do not get shipped with a Shavian hyphenation dictionary, for some reason…

The shyphenate algorithm is really straight-forward: the tool syllabificates using the maximal onset principle: consonants between two vowels tend to be the start of the next syllable, rather than the end of the last one. For example, the word “agree” is syllabified as a-gree (not ag-ree), and “upon” breaks into u-pon (not up-on).

Applying this principle across the board leads to some syllable breaks that look unfamiliar to native english speakers. The lack of consistent spelling in traditional english orthography is largely to blame here: breaking before the consonants often results in a change in pronunciation. Take the word ‘therapist’, for instance. Following maximal onset, I can hyphenate it as ‘𐑔𐑧-𐑮𐑩-pist’ with no problem. In the roman alphabet, we need to be more conservative.

I tried hyphenating at more traditional places, but it seemed wrong to my “inner ear”.3 Take a word like ‘putting’. In Roman, that indubitably becomes ‘put-ting’. But in Shavian, if I read ‘putt-ing’, i’m going to want to put a glottal stop between the two syllables. Know, it turns out that ‘pu-tting’ just works better in Shavian. And it’s easier to implement too!

-Joro

P.𐑕. I am curious to learn if anyone else ever thought about hyphenating shavian? How did you end up doing it? Let me know in the comments billow!

Footnotes

  1. Kerning Shavian turns out to be really hard! The letter shapes are much more irregular than the roman letters.. I still have plenty of work left to do in Bernie Sans, as you can see. ↩︎
  2. I immediately got Claud to create a Word-press plug-in too, and you can see the fruits of that labour right in front of you! ↩︎
  3. When you start learn Shavian, you sound words out loud as you figure out which phoneme belongs where. As you progress, that voice shifts inwards, getting called upon less and less often, usually only when a totally unfamiliar word-shape is encountered. Reading what would amount to unnatural phonotactics really trips you up! ↩︎

Posted

in

, ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *