An introduction to Gregg Shorthand and an attempted English to shorthand converter

 

The idea of strange alternative shorthand writing systems has, for a while, held in me a certain special appeal: the idea of drawing a few short alien symbols to represent entire phrases and sentences.

The Gregg shorthand system, invented over a hundred years ago (1888 to be exact) is one of several such systems. Curiously its original purpose was not to amaze one’s friends. It was originally intended to enable news reporters and secretaries to transcribe english speech at a speed comparable to the speed which english is spoken.

English, or the conventional english writing system, is inheritantly inefficient for such purposes: it is just not physically possible to write english much faster than about 40 words per minute and not have it appear like a collection of meaningless lines.

Shorthand systems address this issue by replacing troublesome letters such as ‘m’ (which always ends up as a scribble when I write it) with simple, clear letters, in this case a straight horizontal line. Plenty of shortening conventions are used, making it possible to write at speeds of 120-160 words per minute. By comparison, I can only type at about 80 words per minute.

As audio recording devices and video camcorders achieved widespread usage, shorthand systems quickly became obsolete and fell into relative obscurity. Just imagine: who would need shorthand when they could just film the speaker and play it back, transcribing in leisure?

Personally the reason that I learned Gregg shorthand a few months ago is less about transcribing other people’s speeches in real time (which I definitely can not do) but more about the ability to write personal notes and diaries, and be relatively confident that nobody (or at least nobody I know) will be able to read them.

Shorthand used to be actually taught in some places. This was decades ago though. On the other hand, if everybody knew Gregg shorthand, it wouldn’t be suitable to use it for writing personal notes anymore.

Just as an example, here’s a notebook of Gregg shorthand (I don’t even know what it’s for):

Looks alien to you? Good.

Actually, shorthand is really simple. The Gregg alphabet is just this:

What’s really smart about this is that similar sounding letters are grouped together, and look similar.

But this is hardly complicated, just different.

The second, less obvious difference is that Gregg shorthand is syllabic, instead of alphabetic.

Let’s try an example:

London bridge is falling down

As shorthand is written the way it’s heard, it would transcribe to something like this:

lndn brej s flng dn

All that is left is the substitution of Gregg syllables for the latin characters:

With a little (okay, a lot) of practice, the above symbols may be written in two or three seconds.

This is pretty much it. Quite a lot easier than learning French or Spanish or Chinese.

There’s a bit more to it. Much of Gregg is the wide variety of brief forms, which are abbreviations of commonly used words to save time. Some of them are pretty obvious:

your = ur

Most are a little less obvious:

correspondence = kres

A few are just downright retarded:

world = uu

Yea. That’s not even the worst. I’m sure they had a reason to do so, but someone a hundred years ago came up with more and more contrived exceptions to save a few strokes on more and more obscure phrases.

For instance, who really needs a symbol for “I am of the opinion“, or another for “I should like to have“? I wouldn’t be too surprised if they had a brief form for “I slept with your mother“. Unfortunately there is none.

(/rant). I actually like the language. Just not most of the brief forms.

In case you’re wondering, here are the symbols for “I am of the opinion” (i-m-o-p-n) and “I should like to have” (i-sh-d-l-a-v):

An attempt at a text to shorthand generator

For some unknown reason, I decided I had the need for an automatic translator from english plaintext to Gregg shorthand.

Being such an ancient writing system, I wasn’t surprised to find that no such software exists (at least none that I know of). Even unicode, whose extensive glyph tables extend from Latin to Chinese and Hebrew and even to ancient egyptian hieroglyphs, does not offer support for the curves of Gregg shorthand.

Fortunately, a translator is still possible without unicode support, albeit some imagination is required. Output is purely graphical, as shorthand cannot otherwise be represented textually.

In concept, an english to shorthand generator is not a very complicated piece of software. There are essentially two parts to it:

One, the english text has to be lexed into their pronounceable syllables. This problem has been faced many times before, mostly by text to speech programs. Indeed this problem is one of the problems faced by even the most basic TTS programs. Thus, plenty of libraries exist for this task already. For this, I chose the FreeTTS library for Java.

For example, here is a sample code snippet for FreeTTS:

Lexicon lexicon = CMULexicon.getInstance(true);
String[] phones = lexicon.getPhones("luckytoilet","n");
for(String phone : phones) System.out.print(phone + " ");

This generates the pronunciation for luckytoilet:

l ah1 k iy t oy1 l ax t

We can next map the FreeTTS syllables to the Gregg syllables. This is a many-to-one mapping: for instance, gregg does not usually distinguish between long (cake) and short (cat) vowels, both having a mapping to “a”. Additionally FreeTTS syllables contain information about vocal tones, which are irrelevant for our purposes.

The second step is to draw the glyphs, from the plaintext syllables. This step I think I’ve done a rather poor job on.

Each letter is contained in a 100px by 100px square PNG file. Additionally, the program has information on where the ‘stroke’ for each letter begins and ends, so that it can position the letters properly.

For example, the k letter:

If we wanted to draw a n after the k, we are able to do that: place the n such that the starting position of the n coincides with the ending position of the k. It’s with this idea that we are able to chain together elaborate combinations of characters.

This way letters can be drawn at the position where the previous letter ends, giving a connected, cursive look.

These two steps are pretty much the entire program. Additionally, there are certain brief forms in Gregg that are treated specially. Here the brief form list is stored in alphabet/2.dat; it is not really the brief forms of any one dialect of Gregg, but rather a combination of them. Also, vowels are generally omitted, so only longer vowels are displayed.

Here is what I came up with (showing an excerpt of Shakespeare’s Hamlet):

When the user types in the text box at the bottom, the shorthand equivalent is computed and drawn in the top region. It doesn’t handle punctuation (or any nonalphanumeric symbols which are simply stripped out).

The project is available on SVN, or checked out with this command:

svn checkout http://bai-projects.googlecode.com/svn/trunk/gregg gregg

Afterthoughts

Admittedly, my program is more of a proof of concept, and is far from perfect. Rather, it’s actually quite crude.

Most words are botched and simply look wrong. In actual Gregg, letters are placed differently based on context: the th may be drawn under or over depending on what characters precede it for example. Vowels are connected in ways that are really tricky to handle in a program. My program simply draws the letters exactly the same no matter where they appear.

For instance, here’s the word cake as rendered by my program:

Indeed, the word cake is transcribed as k-a-k, which is exactly what’s generated by the program. Compare this with the correct version (as printed in the Gregg dictionary) which (correctly) puts the a (circle) under the ks:

There are cases where the a is drawn over, under, to the left, to the right, curved downwards, curved upwards, ad infinitum. In order to generate more correct Gregg, we would have to implement very elaborate and complicated sets of rules to handle the many rules of standard Gregg shorthand.