Using Waveform Plots to Improve your Accent, and a Dive into English Phonology

I was born in China and immigrated to Canada when I was 4 years old. After living in Canada for 18 years, I consider myself a native speaker for most purposes, but I still retain a noticeable non-native accent when speaking.

This post has a video that contains me speaking, if you want to hear what my accent sounds like.

It’s often considered very difficult or impossible to change your accent once you reach adulthood. I don’t know if this is true or not, but it sounds like a self-fulfilling prophecy — the more you think it’s impossible, the less you try, so of course your accent will not get any better. Impossible or not, it’s worth it to give it a try.

The first step is identifying what errors you’re making. This can be quite difficult if you’re not a trained linguist — native English speakers will detect that you have an accent, but they can’t really pinpoint exactly what’s wrong with your speech — it just sounds wrong to them.

One accent reduction strategy is the following: listen to a native speaker saying a sentence (for example, in a movie or on the radio), and repeat the same sentence, mimicking the intonation as closely as possible. Record both sentences, and play them side by side. This way, with all the other confounding factors gone, it’s much easier to identify the differences between your pronunciation and the native one.

When I tried doing this using Audacity, I noticed something interesting. Oftentimes, it was easier to spot differences in the waveform plot (that Audacity shows automatically) than to hear the differences between the audio samples. When you’re used to speaking a certain way all your life, your ears “tune out” the differences.

Here’s an example. The phrase is “figure out how to sell it for less” (Soundcloud):


The difference is clear in the waveform plot. In my audio sample, there are two spikes corresponding to the “t” sound that don’t appear in the native speaker’s sample.

For vowels, the spectrogram works better than the waveform plot. Here’s the words “said” and “sad”, which differ in only the vowel:


Again, if you find it difficult to hear the difference, it helps to have a visual representation to look at.

I was surprised to find out that I’d been pronouncing the “t” consonant incorrectly all my life. In English, the letter “t” represents an aspirated alveolar stop (IPA /tʰ/), which is what I’m doing, right? Well, no. The letter “t” does produce the sound /tʰ/ at the beginning of a word, but in American English, the “t” at the final position of a word can get de-aspirated so that there’s no audible release. It can also turn into a glottal stop (IPA /ʔ/) in some dialects, but native speakers rarely pronounce /tʰ/, except in careful speech.

This is a phonological rule, and there are many instances of this. Here’s a simple experiment: put your hand in front of your mouth and say the word “pin”. You should feel a puff of air in your palm. Now say the word “spin” — and there is no puff of air. This is because in English, the /p/ sound always changes into /b/ following the /s/ sound.

Now this got me curious and I wondered: exactly what are the rules governing sound changes in English consonants? Can I learn them so I don’t make this mistake again? Native English speakers don’t know these rules (consciously at least), and even ESL materials don’t go into much detail about subtle aspects of pronunciation. The best resources for this would be linguistics textbooks on English phonology.

I consulted a textbook called “Gimson’s Pronunciation of English” [1]. For just the rules regarding sound changes of the /t/ sound at the word-final position, the book lists 6 rules. Here’s a summary of the first 3:

  • No audible release in syllable-final positions, especially before a pause. Examples: mat, map, robe, road. To distinguish /t/ from /d/, the preceding vowel is lengthened for /d/ and shortened for /t/.
  • In stop clusters like “white post” (t + p) or “good boy” (d + b), there is no audible release for the first consonant.
  • When a plosive consonant is followed by a nasal consonant that is homorganic (articulated in the same place), then the air is released out of the nose instead of the mouth (eg: topmost, submerge). However, this doesn’t happen if the nasal consonant is articulated in a different place (eg: big man, cheap nuts).

As you can see, the rules are quite complicated. The book is somewhat challenging for non-linguists — these are just the rules for /t/ at the word-final position; the book goes on to spend hundreds of pages to cover all kinds of vowel changes that occur in stressed and unstressed syllables, when combined with other words, and so on. For a summary, take a look at the Wikipedia article on English Phonology.

What’s really amazing is how native speakers learn all these patterns, perfectly, as babies. Native speakers may make orthographic mistakes like mixing up “their, they’re, there”, but they never make phonological mistakes like forgetting to de-aspirate the /p/ in “spin” — they simply get it right every time, without even realizing it!

Some of my friends immigrated to Canada at a similar or later age than me, and learned English with no noticeable accent. Therefore, people sometimes found it strange that I still have an accent. Even more interesting is the fact that although my pronunciation is non-native, I don’t make non-native grammatical mistakes. In other words, I can intuitively judge which sentences are grammatical or ungrammatical just as well as a native speaker. Does that make me a linguistic anomaly? Intrigued, I dug deeper into academic research.

In 1999, Flege et al. conducted a study of Korean-American immigrants who moved to the USA at an early age [2]. Each participant was given two tasks. In the first task, the participant was asked to speak a series of English sentences, and native speakers judged how much of a foreign accent was present on a scale from 1 to 9. In the second task, the participant was a list of English sentences, some grammatical and some not, and picked which ones were grammatical.

Linguists hypothesize that during first language acquisition, babies learn the phonology of their language long before they start to speak; grammatical structure is acquired much later. The Korean-American study seems to support this hypothesis. For the phonological task, immigrants who arrived as young as age 3 sometimes retained a non-native accent into adulthood.

Above: Scores for phonological task decrease as age of arrival increases, but even very early arrivals retain a non-native accent.

Basically, arriving before age 6 or so increases the chance of the child developing a native-like accent, but by no means does it guarantee it.

On the other hand, the window for learning grammar is much longer:

Above: Scores for grammatical task only start to decrease after about age 7.

Age of arrival is a large factor, but does not explain everything. Some people are just naturally better at acquiring languages than others. The study also looked at the effect of other factors like musical ability and perceived importance of English on the phonological score, but the connection is a lot weaker.

Language is so easy that every baby picks it up, yet so complex that linguists write hundreds of pages to describe it. Even today, language acquisition is poorly understood, and there are many unresolved questions about how it works.


  1. Cruttenden, Alan. “Gimson’s Pronunciation of English, 8th Edition”. Routeledge, 2014.
  2. Flege, James Emil et al. “Age Constraints on Second Language Acquisition”. Journal of Memory and Language, Issue 41, 1999.

An introduction to Gregg Shorthand and an attempted English to shorthand converter


The idea of strange alternative shorthand writing systems has, for a while, held in me a certain special appeal: the idea of drawing a few short alien symbols to represent entire phrases and sentences.

The Gregg shorthand system, invented over a hundred years ago (1888 to be exact) is one of several such systems. Curiously its original purpose was not to amaze one’s friends. It was originally intended to enable news reporters and secretaries to transcribe english speech at a speed comparable to the speed which english is spoken.

English, or the conventional english writing system, is inheritantly inefficient for such purposes: it is just not physically possible to write english much faster than about 40 words per minute and not have it appear like a collection of meaningless lines.

Shorthand systems address this issue by replacing troublesome letters such as ‘m’ (which always ends up as a scribble when I write it) with simple, clear letters, in this case a straight horizontal line. Plenty of shortening conventions are used, making it possible to write at speeds of 120-160 words per minute. By comparison, I can only type at about 80 words per minute.

As audio recording devices and video camcorders achieved widespread usage, shorthand systems quickly became obsolete and fell into relative obscurity. Just imagine: who would need shorthand when they could just film the speaker and play it back, transcribing in leisure?

Personally the reason that I learned Gregg shorthand a few months ago is less about transcribing other people’s speeches in real time (which I definitely can not do) but more about the ability to write personal notes and diaries, and be relatively confident that nobody (or at least nobody I know) will be able to read them.

Shorthand used to be actually taught in some places. This was decades ago though. On the other hand, if everybody knew Gregg shorthand, it wouldn’t be suitable to use it for writing personal notes anymore.

Just as an example, here’s a notebook of Gregg shorthand (I don’t even know what it’s for):

Looks alien to you? Good.

Actually, shorthand is really simple. The Gregg alphabet is just this:

What’s really smart about this is that similar sounding letters are grouped together, and look similar.

But this is hardly complicated, just different.

The second, less obvious difference is that Gregg shorthand is syllabic, instead of alphabetic.

Let’s try an example:

London bridge is falling down

As shorthand is written the way it’s heard, it would transcribe to something like this:

lndn brej s flng dn

All that is left is the substitution of Gregg syllables for the latin characters:

With a little (okay, a lot) of practice, the above symbols may be written in two or three seconds.

This is pretty much it. Quite a lot easier than learning French or Spanish or Chinese.

There’s a bit more to it. Much of Gregg is the wide variety of brief forms, which are abbreviations of commonly used words to save time. Some of them are pretty obvious:

your = ur

Most are a little less obvious:

correspondence = kres

A few are just downright retarded:

world = uu

Yea. That’s not even the worst. I’m sure they had a reason to do so, but someone a hundred years ago came up with more and more contrived exceptions to save a few strokes on more and more obscure phrases.

For instance, who really needs a symbol for “I am of the opinion“, or another for “I should like to have“? I wouldn’t be too surprised if they had a brief form for “I slept with your mother“. Unfortunately there is none.

(/rant). I actually like the language. Just not most of the brief forms.

In case you’re wondering, here are the symbols for “I am of the opinion” (i-m-o-p-n) and “I should like to have” (i-sh-d-l-a-v):

An attempt at a text to shorthand generator

For some unknown reason, I decided I had the need for an automatic translator from english plaintext to Gregg shorthand.

Being such an ancient writing system, I wasn’t surprised to find that no such software exists (at least none that I know of). Even unicode, whose extensive glyph tables extend from Latin to Chinese and Hebrew and even to ancient egyptian hieroglyphs, does not offer support for the curves of Gregg shorthand.

Fortunately, a translator is still possible without unicode support, albeit some imagination is required. Output is purely graphical, as shorthand cannot otherwise be represented textually.

In concept, an english to shorthand generator is not a very complicated piece of software. There are essentially two parts to it:

One, the english text has to be lexed into their pronounceable syllables. This problem has been faced many times before, mostly by text to speech programs. Indeed this problem is one of the problems faced by even the most basic TTS programs. Thus, plenty of libraries exist for this task already. For this, I chose the FreeTTS library for Java.

For example, here is a sample code snippet for FreeTTS:

Lexicon lexicon = CMULexicon.getInstance(true);
String[] phones = lexicon.getPhones("luckytoilet","n");
for(String phone : phones) System.out.print(phone + " ");

This generates the pronunciation for luckytoilet:

l ah1 k iy t oy1 l ax t

We can next map the FreeTTS syllables to the Gregg syllables. This is a many-to-one mapping: for instance, gregg does not usually distinguish between long (cake) and short (cat) vowels, both having a mapping to “a”. Additionally FreeTTS syllables contain information about vocal tones, which are irrelevant for our purposes.

The second step is to draw the glyphs, from the plaintext syllables. This step I think I’ve done a rather poor job on.

Each letter is contained in a 100px by 100px square PNG file. Additionally, the program has information on where the ‘stroke’ for each letter begins and ends, so that it can position the letters properly.

For example, the k letter:

If we wanted to draw a n after the k, we are able to do that: place the n such that the starting position of the n coincides with the ending position of the k. It’s with this idea that we are able to chain together elaborate combinations of characters.

This way letters can be drawn at the position where the previous letter ends, giving a connected, cursive look.

These two steps are pretty much the entire program. Additionally, there are certain brief forms in Gregg that are treated specially. Here the brief form list is stored in alphabet/2.dat; it is not really the brief forms of any one dialect of Gregg, but rather a combination of them. Also, vowels are generally omitted, so only longer vowels are displayed.

Here is what I came up with (showing an excerpt of Shakespeare’s Hamlet):

When the user types in the text box at the bottom, the shorthand equivalent is computed and drawn in the top region. It doesn’t handle punctuation (or any nonalphanumeric symbols which are simply stripped out).

The project is available on SVN, or checked out with this command:

svn checkout gregg


Admittedly, my program is more of a proof of concept, and is far from perfect. Rather, it’s actually quite crude.

Most words are botched and simply look wrong. In actual Gregg, letters are placed differently based on context: the th may be drawn under or over depending on what characters precede it for example. Vowels are connected in ways that are really tricky to handle in a program. My program simply draws the letters exactly the same no matter where they appear.

For instance, here’s the word cake as rendered by my program:

Indeed, the word cake is transcribed as k-a-k, which is exactly what’s generated by the program. Compare this with the correct version (as printed in the Gregg dictionary) which (correctly) puts the a (circle) under the ks:

There are cases where the a is drawn over, under, to the left, to the right, curved downwards, curved upwards, ad infinitum. In order to generate more correct Gregg, we would have to implement very elaborate and complicated sets of rules to handle the many rules of standard Gregg shorthand.