# Learning the Teochew (Chaozhou) Dialect

Lately I’ve been learning my girlfriend’s dialect of Chinese, called the Teochew dialect.  Teochew is spoken in the eastern part of the Guangdong province by about 15 million people, including the cities of Chaozhou, Shantou, and Jieyang. It is part of the Min Nan (闽南) branch of Chinese languages.

Above: Map of major dialect groups of Chinese, with Teochew circled. Teochew is part of the Min branch of Chinese. Source: Wikipedia.

Although the different varieties of Chinese are usually refer to as “dialects”, linguists consider them different languages as they are not mutually intelligible. Teochew is not intelligible to either Mandarin or Cantonese speakers. Teochew and Mandarin diverged about 2000 years ago, so today they are about as similar as French is to Portuguese. Interestingly, linguists claim that Teochew is one of the most conservative Chinese dialects, preserving many archaic words and features from Old Chinese.

Above: Sample of Teochew speech from entrepreneur Li Ka-shing.

Since I like learning languages, naturally I started learning my girlfriend’s native tongue soon after we started dating. It helped that I spoke Mandarin, but Teochew is not close enough to simply pick up by osmosis, it still requires deliberate study. Compared to other languages I’ve learned, Teochew is challenging because very few people try to learn it as a foreign language, thus there are few language-learning resources for it.

## Writing System

The first hurdle is that Teochew is primarily spoken, not written, and does not have a standard writing system. This is the case with most Chinese dialects. Almost all Teochews are bilingual in Standard Chinese, which they are taught in school to read and write.

Sometimes people try to write Teochew using Chinese characters by finding the equivalent Standard Chinese cognates, but there are many dialectal words which don’t have any Mandarin equivalent. In these cases, you can invent new characters or substitute similar sounding characters, but there’s no standard way of doing this.

Still, I needed a way to write Teochew, to take notes on new vocabulary and grammar. At first, I used IPA, but as I became more familiar with the language, I devised my own romanization system that captured the sound differences.

## Cognates with Mandarin

Knowing Mandarin was very helpful for learning Teochew, since there are lots of cognates. Some cognates are obviously recognizable:

• Teochew: kai shim, happy. Cognate to Mandarin: kai xin, 开心.
• Teochew: ing ui, because. Cognate to Mandarin: ying wei, 因为

Some words have cognates in Mandarin, but mean something slightly different, or aren’t commonly used:

• Teochew: ou, black. Cognate to Mandarin: wu, 乌 (dark). The usual Mandarin word is hei, 黑 (black).
• Teochew: dze: book. Cognate to Mandarin: ce, 册 (booklet). The usual Mandarin word is shu, 书 (book).

Sometimes, a word has a cognate in Mandarin, but sound quite different due to centuries of sound change:

• Teochew: hak hau, school. Cognate to Mandarin: xue xiao, 学校.
• Teochew: de, pig. Cognate to Mandarin: zhu, 猪.
• Teochew: dung: center. Cognate to Mandarin: zhong, 中.

In the last two examples, we see a fairly common sound change, where a dental stop initial (d- and t-) in Teochew corresponds to an affricate (zh- or ch-) in Mandarin. It’s not usually enough to guess the word, but serves as a useful memory aid.

Finally, a lot of dialectal Teochew words (I’d estimate about 30%) don’t have any recognizable cognate in Mandarin. Examples:

• da bo: man
• no gya: child
• ge lai: home

## Grammatical Differences

Generally, I found Teochew grammar to be fairly similar to Mandarin, with only minor differences. Most grammatical constructions can transfer cognate by cognate and still make sense in the other language.

One significant difference in Teochew is the many fused negation markers. Here, a syllable starts with the initial b- or m- joined with a final to negate something. Some examples:

• bo: not have
• boi: will not
• bue: not yet
• mm: not
• mai: not want
• ming: not have to

## Phonology and Tone Sandhi

The sound structure of Teochew is not too different from Mandarin, and I didn’t find it difficult to pronounce. The biggest difference is that syllables may end with a stop: -t, -k, -p, and -m, whereas Mandarin syllables can only end with a vowel or nasal. The characteristic of a Teochew accent in Mandarin is replacing /f/ with /h/, and indeed there is no /f/ sound in Teochew.

The hardest part of learning Teochew for me were the tones. Teochew has either six or eight tones depending on how you count them, which isn’t difficult to produce in isolation. However, Teochew has a complex system of tone sandhi rules, where the tone of each syllable changes depending on the tone of the following syllable. Mandarin has tone sandhi to some extent (for example, the third tone sandhi rule where nǐ + hǎo is pronounced níhǎo rather than nǐhǎo). But Teochew takes this to a whole new level, where nearly every syllable undergoes contextual tone change.

Some examples (the numbers are Chao tone numerals, with 1 meaning lowest and 5 meaning highest tone):

• gu5: cow
• gu1 nek5: beef

Another example, where a falling tone changes to a rising tone:

• seng52: to play
• seng35 iu3 hi1: to play a game

There are tables of tone sandhi rules describing in detail how each tone gets converted to what other tone, but this process is not entirely regular and there are exceptions. As a result, I frequently get the tone wrong by mistake.

## Resources for Learning Teochew

Teochew is seldom studied as a foreign language, so there aren’t many language learning resources for it. Even dictionaries are hard to find. One helpful dictionary is Wiktionary, which has the Teochew pronunciation for most Chinese characters.

Also helpful were formal linguistic grammars:

1. Xu, Huiling. “Aspects of Chaoshan grammar: A synchronic description of the Jieyang dialect.” Monograph Series Journal of Chinese Linguistics 22 (2007).
2. Yeo, Pamela Yu Hui. “A sketch grammar of Singapore Teochew.” (2011).

The first is a massively detailed, 300-page description of Teochew grammar, while the second is a shorter grammar sketch on a similar variety spoken in Singapore. They require some linguistics background to read. Of course, the best resource is my girlfriend, a native speaker of Teochew.

## Visiting the Chaoshan Region

After practicing my Teochew for a few months with my girlfriend, we paid a visit to her hometown and relatives in the Chaoshan region. More specifically, Raoping County located on the border between Guangdong and Fujian provinces.

Left: Chaoshan railway station, China. Right: Me learning the Gongfu tea ceremony, an essential aspect of Teochew culture.

Teochew people are traditional and family oriented, very much unlike the individualistic Western values that I’m used to. In Raoping and Guangzhou, we attended large family gatherings in the afternoon, chatting and gossiping while drinking tea. Although they are still Han Chinese, the Teochew consider themselves a distinct subgroup within Chinese, with their unique culture and language. The Teochew are especially proud of their language, which they consider to be extremely hard for outsiders to learn. Essentially, speaking Teochew is what separates “ga gi nang” (roughly translated as “our people”) from the countless other Chinese.

My Teochew is not great. Sometimes I struggle to get the tones right and make myself understood. But at a large family gathering, a relative asked me why I was learning Teochew, and I was able to reply, albeit with a Mandarin accent: “I want to learn Teochew so that I can be part of your family”.

Above: Me, Elaine, and her grandfather, on a quiet early morning excursion to visit the sea. Raoping County, Guangdong Province, China.

Thanks to my girlfriend Elaine Ye for helping me write this post. Elaine is fluent in Teochew, Mandarin, Cantonese, and English.

# Clustering Autoencoders: Comparing DEC and DCN

Deep autoencoders are a good way to learn representations and structure from unlabelled data. There are many variations, but the main idea is simple: the network consists of an encoder, which converts the input into a low-dimensional latent vector, and a decoder, which reconstructs the original input. Then, the latent vector captures the most essential information in the input.

Above: Diagram of a simple autoencoder (Source)

One of the uses of autoencoders is to discover clusters of similar instances in an unlabelled dataset. In this post, we examine some ways of clustering with autoencoders. That is, we are given a dataset and K, the number of clusters, and need to find a low-dimensional representation that contains K clusters.

## Problem with Naive Method

An naive and obvious solution is to take the autoencoder, and run K-means on the latent points generated by the encoder. The problem is that the autoencoder is only trained to reconstruct the input, with no constraints on the latent representation, and this may not produce a representation suitable for K-means clustering.

Above: Failure example with naive autoencoder clustering — K-means fails to find the appropriate clusters

Above is an example from one of my projects. The left diagram shows the hidden representation, and the four classes are generally well-separated. This representation is reasonable and the reconstruction error is low. However, when we run K-means (right), it fails spectacularly because the two latent dimensions are highly correlated.

Thus, our autoencoder can’t trivially be used for clustering. Fortunately, there’s been some research in clustering autoencoders; in this post, we study two main approaches: Deep Embedded Clustering (DEC), and Deep Clustering Network (DCN).

## DEC: Deep Embedded Clustering

DEC was proposed by Xie et al. (2016), perhaps the first model to use deep autoencoders for clustering. The training consists of two stages. In the first stage, we initialize the autoencoder by training it the usual way, without clustering. In the second stage, we throw away the decoder, and refine the encoder to produce better clusters with a “cluster hardening” procedure.

Above: Diagram of DEC model (Xie et al., 2016)

Let’s examine the second stage in more detail. After training the autoencoder, we run K-means on the hidden layer to get the initial centroids $\{\mu_i\}_{i=1}^K$. The assumption is the initial cluster assignments are mostly correct, but we can still refine them to be more distinct and separated.

First, we soft-assign each latent point $z_i$ to the cluster centroids $\{\mu_i\}_{i=1}^K$ using the Student’s t-distribution as a kernel:

$q_{ij} = \frac{(1 + ||z_i - \mu_j||^2 / \alpha)^{-\frac{\alpha+1}{2}}}{\sum_{j'} (1 + ||z_i - \mu_{j'}||^2 / \alpha)^{-\frac{\alpha+1}{2}}}$

In the paper, they fix $\alpha=1$ (the degrees of freedom), so the above can be simplified to:

$q_{ij} = \frac{(1 + ||z_i - \mu_j||^2)^{-1}}{\sum_{j'} (1 + ||z_i - \mu_{j'}||^2)^{-1}}$

Next, we define an auxiliary distribution P by:

$p_{ij} = \frac{q_{ij}^2/f_j}{\sum_{j'} q_{ij'}^2 / f_{j'}}$

where $f_j = \sum_i q_{ij}$ is the soft cluster frequency of cluster j. Intuitively, squaring $q_{ij}$ draws the probability distribution closer to the centroids.

Above: The auxiliary distribution P is derived from Q, but more concentrated around the centroids

Finally, we define the objective to minimize as the KL divergence between the soft assignment distribution Q and the auxiliary distribution P:

$L = KL(P||Q) = \sum_i \sum_j p_{ij} \log \frac{p_{ij}}{q_{ij}}$

Using standard backpropagation and stochastic gradient descent, we can train the encoder to produce latent points $z_i$ to minimize the KL divergence L. We repeat this until the cluster assignments are stable.

## DCN: Deep Clustering Network

DCN was proposed by Yang et al. (2017) at around the same time as DEC. Similar to DEC, it initializes the network by training the autoencoder to only reconstruct the input, and initialize K-means on the hidden representations. But unlike DEC, it then alternates between training the network and improving the clusters, using a joint loss function.

Above: Diagram of DCN model (Yang et al., 2017)

We define the optimization objective as a combination of reconstruction error (first term below) and clustering error (second term below). There’s a hyperparameter $\lambda$ to balance the two terms:

This function is complicated and difficult to optimize directly. Instead, we alternate between fixing the clusters while updating the network parameters, and fixing the network while updating the clusters. When we fix the clusters (centroid locations and point assignments), then the gradient of L with respect to the network parameters can be computed with backpropagation.

Next, when we fix the network parameters, we can update the cluster assignments and centroid locations. The paper uses a rolling average trick to update the centroids in an online manner, but I won’t go into the details here. The algorithm as presented in the paper looks like this:

To recap, DEC and DCN are both models to perform unsupervised clustering using deep autoencoders. When evaluated on MNIST clustering, their accuracy scores are comparable. For both models, the scores depend a lot on initialization and hyperparameters, so it’s hard to say which is better.

One theoretical disadvantage of DEC is that in the cluster refinement phase, there is no longer any reconstruction loss to force the representation to remain reasonable. So the theoretical global optimum can be achieved trivially by mapping every input to the zero vector, but this does not happen in practice when using SGD for optimization.

Recently, there have been lots of innovations in deep learning for clustering, which I won’t be covering in this post; the review papers by Min et al. (2018) and Aljalbout et al. (2018) provide a good overview of the topic. Still, DEC and DCN are strong baselines for the clustering task, which newer models are compared against.

## References

1. Xie, Junyuan, Ross Girshick, and Ali Farhadi. “Unsupervised deep embedding for clustering analysis.” International conference on machine learning. 2016.
2. Yang, Bo, et al. “Towards k-means-friendly spaces: Simultaneous deep learning and clustering.” Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
3. Min, Erxue, et al. “A survey of clustering with deep learning: From the perspective of network architecture.” IEEE Access 6 (2018): 39501-39514.
4. Aljalbout, Elie, et al. “Clustering with deep learning: Taxonomy and new methods.” arXiv preprint arXiv:1801.07648 (2018).

# NAACL 2019, my first conference talk, and general impressions

Last week, I attended my first NLP conference, NAACL, which was held in Minneapolis. My paper was selected for a short talk of 12 minutes in length, plus 3 minutes for questions. I presented my research on dementia detection in Mandarin Chinese, which I did during my master’s.

My talk was recorded, but the organizers have not uploaded them yet, so I can’t share it right now. I’ll update this page with a link to the video when it’s ready.

## Visiting Minneapolis

Going to conferences is a good way as a grad student to travel for free. Some of my friends balked at the idea of going to Minneapolis rather than somewhere more “interesting”. However, I had never been there before, and in the summer, Minneapolis was quite nice.

## Sky Burial by Xinran Xue

In this novel, a Chinese women, Shu Wen from Suzhou, travels to Tibet to search for her missing husband. This was in 1958, when the Chinese Communist Party annexed Tibet. On the way there, she picks up a Tibetan woman, Zhuoma. They get into some trouble in the mountains and meet a Tibetan family, and gradually Wen integrates into the Tibetan culture and learns the language and customs. Time passes by quickly and before you realize it, 30 years has passed while they have practically no information from the outside world. In the end, Wen does find out what happened to her husband through his diaries, but it’s a bittersweet sort of ending as her world is changed unrecognizably and her husband is dead.

The author makes it ambiguous whether this is a work of fiction or it actually happened — all the facts seem believable, other than somehow not finding out about the great famine and cultural revolution for decades. A lot of interesting Tibetan customs are explained: their nomadic lifestyle, polyamorous family structure, buddhist religious beliefs, and their practice of sky burial which lets vultures eat their dead. The relationship between the Chinese and Tibetan has always been a contentious one, and in this book they form a connection of understanding between the two ethnic groups.

Tibet seems like a really interesting place that I should visit someday. However, it’s unclear how much of their traditional culture is still accessible, due to the recent Han Chinese migrations. Also, it’s currently impossible to travel freely in Tibet without a tour group if you’re not a Chinese citizen.

## Getting to YES by Fisher, Ury, and Patton

This book tells you how to negotiate more effectively. A common negotiating mistake is to use positional negotiation, which is each side picking an arbitrary position (eg: buy the car for \$5000), and going back and forth until you’re tired and agree, or you both walk out. Positional negotiation is highly arbitrary, and often leads to no agreement, which is bad for both parties.

Some ways to negotiate in a more principled way:

• Emphasize with the other party, get to know them and their values, treat it as both parties against a common problem rather than you trying to “win” the negotiation.
• Focus on interests, rather than positions. During the negotiation, figure out what each party really wants; sometimes, it’s possible to give them something that’s valuable for them but you don’t really care about. Negotiation is a nonzero sum game, so try to find creative solutions that fulfill everybody’s interests, rather than fight over a one-dimensional figure.
• When creative solutions are not possible (both sides just want money), defer to objective measures like industry standards. This gives you both an anchor to use, rather than negotiating in a vacuum.
• Be aware of your and the other party’s BATNA: best alternative to negotiated agreement. This determines who holds more power in a negotiation, and improving it is a good way to get more leverage.

## Trump: A Graphic Biography by Ted Rall

A biography of Trump in graphical novel format. This book was written after Trump won the republican primaries (May 2016) but before he won the presidency (Nov 2016).

First, the book describes the political and economic circumstances that led to Trump coming into power. After the 2008 financial crisis, many low-skilled Americans felt like there was little economic opportunity for them. Many politicians had come and gone, promising change, but nothing happened. For them, Trump represented a change from the political establishment. They didn’t necessarily agree with all of his policies, they just wanted something radical.

Trump was born after WW2 to a wealthy family in New York City. He studied economics and managed a real estate empire for a few decades, which made him a billionaire. Through his deals in real estate, he proved himself a cunning and ruthless negotiator who is willing to behave unethically and use deception to get what he wanted.

This was a good read because most of my friend group just thinks Trump is “stupid”, and everyone who voted for him is stupid. I never really understood why he was so popular among the other demographic. As a biography, the graphic novel format is good because it’s much shorter; most other biographies go into way too much detail about a single person’s life than I care to know about.

## 12 Rules for Life by Jordan Peterson

Jordan Peterson’s new book that quickly hit #1 on the bestsellers lists after being released this year. He’s famous around UofT for speaking out against social justice warriors, but I later found out that he has a lot of YouTube videos on philosophy of how to live your life. This book summarizes a lot of these ideas into a single book form, in the form of 12 “rules” to live by, in order to live a good and meaningful life.

These ideas are the most interesting and novel to me:

• Dominance hierarchy: humans (especially men) instinctively place each other on a hierarchy, where the person at the top has all the power and status, and gets all the resources. Women want to date guys near the top of the hierarchy, and men near the top get many women easily while men at the bottom can’t even find one. Therefore, it’s essential to rise to the top of the dominance hierarchy.
• Order and chaos: order is the part of the world that we understand, that behaves according to rules; chaos is the unknown, risk, failure. To live a meaningful life is to straddle the boundary between order and chaos, and have a little bit of both.
• When raising children, it’s the parents’ responsibility to educate them how to behave properly to follow social norms, because otherwise, society will treat them harshly and this will snowball into social isolation later in life. Also, they should be encouraged to do risky things (within reason) to explore / develop their masculinity.

Some of the other rules are more obvious. Examples include: be truthful to yourself, choose your friends wisely, improve yourself incrementally rather than comparing yourself to others, confront issues quickly as they arise. I guess depending on your personality and prior experience, you might find a different subset of these rules to be obvious.

Initially, I found JP to be obnoxious because of the lack of scientific rigour in his arguments, he just seems convincing because he’s well-spoken. The book does a slightly better job than the videos in substantiating the arguments and citing various psychology research papers. JP also has a tendency to cite literature; when he goes into stuff like bible archetypes of Christ, or Cain/Abel, then I have no idea what he’s talking about anymore. The book felt a bit long. Overall still a good read, I learned a lot from this book and also by diving deeper into the psychology papers he cited.

## Analects by Confucius

The Analects (论语) is a book of philosophy by Confucius and lays down the groundwork for much of Chinese thinking for the next 2500 years. It’s the second book I’ve read in ancient Chinese literature after the Art of War. It’s written in a somewhat different style — it has 20 chapters of varying lengths, but the chapters aren’t really organized by topic and the writing jumps around a lot.

Confucius tells you how to live your life not by appeal to religion, but rather by showing characteristics that he considers “good”, and gives examples of what is and what isn’t considered good. A few reoccuring ideas:

• junzi 君子 – exemplary person. The ideal, wise person that we should strive to be. A junzi strives to be excellent (德) and honorable (信), and not be arrogant or greedy or materialistic. He seeks knowledge, respects elders, is not afraid to speak up, and conducts himself authoratatively.

• li 礼- ritual propriety. The idea that there are certain “rituals” that society observes, and that if a leader respects them, then things will go smoothly. Kind of like the “meta” in games — modern examples would be the employer/employee relationship, or what situations do you perform a handshake with someone.

• xiao 孝 – filial responsibility. A son must respect his parents and take care of them in old age, and mourn for them for three years after their death (since for three years after birth, a child is helpless unless for his parents).

• haoxue 好学 – love of learning for the sake of learning

• ren 仁 – authorative conduct / benevolence / humanity. Basically a leader should conduct himself in a responsible manner, be fair yet firm.

• dao 道 – the way. One should forge one’s path through life.

An obvious question is why should we listen to Confucius if there’s no appeal either to a higher power (like the bible) or by axiomizing everything. I don’t really know, but many Chinese have studied this book and lived their lives according to its principles, so by studying it, we can better understand how Chinese think.

I feel like the Analects tells us how an ideal Chinese is “supposed” to think, but modern Chinese people are very much the opposite. Modern Chinese people are generally very materialistic, competitive, and care about comparing themselves to people around them. A friend said much of what is written here is “obvious” to any Chinese person — but then why don’t they actually follow it? I guess modern Chinese society is very unequal, and one must be competitive to rise to the top to prosper. So the cynical answer is that recent economic forces override thousand-year philosophy, which is the ideal, but falls apart when push comes to shove.

The Analects is a very thought-provoking book. It’s surprising how many things Confucius said 2500 years ago is still true today. I probably missed a lot of things in my first pass through it — but this is a good starting point for further reading on Chinese philosophy and literature.

## Pachinko by Min Jin Lee

Pachinko is the name of the Japanese pinball game, where you watch metal balls tumble through a machine. It’s also the name of this novel, that traces a Korean family in Japan through four generations (Yangjin/Hoonie/Hansu -> Sunja/Isak -> Noa/Mozasu -> Solomon/Phoebe). Sunja is the first generation to immigrate to Japan during the 1930s, after being tricked by a rich guy who got her pregnant. Afterwards, they make their livelihoods in Japan, but they are always considered outsiders, despite being in the country for many generations.

It’s surprising to see so much racism in Japan towards Koreans, since Canada is so multicultural and so accepting of people from other places. Japan is very different: even after four generations in Japan, a Korean boy is still considered a guest and must register with the government every few years or risk getting deported. The Koreans in Japan can’t work the same jobs as the Japanese, can’t legally rent property, and get bullied at school, so they end up working in pachinko parlors, which the Japanese consider “dirty”. All the Korean men: Mozasu, Noa, and Solomon end up working in pachinko, hence the name of the book.

One thing that struck me was how so many of the characters valued idealism more than rationality. Yoseb doesn’t want his wife to go out to work because he considers it improper. Sunja and Noa don’t want to accept Hansu’s help because of shame, even though they could have benefitted a lot, materially. All the Christians have this sort of idealist irrationality, which I guess is part of being religious — only Hansu behaves in a way that makes sense to me. This book gets a bit slow in the end as there are too many minor characters, but is overall a thought provoking read about racism in Japanese society.

## Visual Intelligence by Amy Herman

This book uses art to teach you to notice your surroundings more, which is very interesting. The basic premise is there’s a lot of things that we miss, but can be quite important. The two biggest ideas in this book for me:

1. Train yourself to be more visually perceptive by looking at art, and trying to notice every detail. This seems trivial but often we miss things. Now in the real world, do the same thing and see things in a different way.

2. Our experiences shape how we perceive things, so it’s important to describe things objectively rather than subjectively. Do not make assumptions, rather, describe only the facts of what you see. From a picture you can’t infer a person is “homeless”, but rather that he’s “lying on a street next to a shopping cart”.

## Memoirs of a Geisha by Arthur Golden

This novel tells the story of the geisha Sayuri, from her childhood until her death. It pretends to be a real memoir, but it’s written by an American man. The facts are thoroughly researched, so we get a feel of what Kyoto was like before the war.

Essentially, society in Japan was very unequal — the women have to go through elaborate rituals and endure a lot of suffering to please the men, who just have a lot of money. However, even without formal power, the geishas like Mameha and Hatsumomo construct elaborate schemes of deceit and trickery.

The plot was exciting to read, but certain characters felt flat. Sayuri’s infatuation for the chairman for decades doesn’t seem believable — maybe I would’ve had a crush like that as a teenager, but certainly a woman in her late 20s should know better. Hatsumomo’s degree of evilness didn’t seem convincing either.

Lastly, having read some novels by actual Japanese authors, this book feels nothing like them. Japanese literature is a lot more mellow, and the characters more reserved: certainly nobody would act in such an obviously evil manner. Japanese novels also typically have themes of loneliness and isolation and end with people committing suicide, which doesn’t happen in this novel either.