Paper Review: Linguistic Features to Identify Alzheimer’s Disease

Today I’m going to be sharing a paper I’ve been looking at, related to my research: “Linguistic Features Identify Alzheimer’s Disease in Narrative Speech” by Katie Fraser, Jed Meltzer, and my adviser Frank Rudzicz. The paper was published in 2016 in the Journal of Alzheimer’s Disease. It uses NLP to automatically diagnose patients with Alzheimer’s disease, given a sample of their speech.

Alzheimer’s disease is a disease that you might have heard of, but it doesn’t get much attention in the media, unlike cancer and stroke. It is a neurodegenerative disease that mostly affects elderly people. 5 million Americans are living with Alzheimer’s, including 1 in 9 over the age of 65, and 1 in 3 over the age of 85.

Alzheimer’s is also the most expensive disease in America. After diagnosis, patients may continue to live for over 10 years, and during much of this time, they are unable to care for themselves and require a constant caregiver. In 2017, 68% of Medicare and Medicaid’s budget is spent on patients with Alzheimer’s, and this number is expected to increase as the elderly population grows.

Despite a lot of recent advances in our understanding of the disease, there is currently no cure for Alzheimer’s. Since the disease is so prevalent and harmful, research in this direction is highly impactful.

Previous tests to diagnose Alzheimer’s

One of the early signs of Alzheimer’s is having difficulty remembering things, including words, leading to a decrease in vocabulary. A reliable way to test for this is a retrieval question like the following (Monsch et al., 1992):

In the next 60 seconds, name as many items as possible that can be found in a supermarket.

A healthy person could rattle out about 20-30 items in a minute, whereas someone with Alzheimer’s could only produce about 10. By setting the threshold at 16 items, they could classify even mild cases of Alzheimer’s with about 92% accuracy.

This doesn’t quite capture the signs of Alzheimer’s disease though. Patients with Alzheimer’s tend to be rambly and incoherent. This can be tested with a picture description task, where the patient is given a picture and asked to describe it with as much detail as possible (Giles, Patterson, Hodges, 1994).

73c894ea4d2dc12ca69a6380e51f1d62Above: Boston Cookie Theft picture used for picture description task

There is no time limit, and the patients talked until they indicated they had nothing more to say, or if they didn’t say anything for 15 seconds.

Patients with Alzheimer’s disease produced descriptions with varying degrees of incoherence. Here’s an example transcript, from the above paper:

Experimenter: Tell me everything you see going on in this picture

Patient: oh yes there’s some washing up going on / (laughs) yes / …… oh and the other / ….. this little one is taking down the cookie jar / and this little girl is waiting for it to come down so she’ll have it / ………. er this girl has got a good old splash / she’s left the taps on (laughs) she’s gone splash all down there / um …… she’s got splash all down there

You can clearly tell that something’s off, but it’s hard to put a finger on exactly what the problem is. Well, time to apply some machine learning!

Results of Paper

Fraser’s 2016 paper uses data from the DementiaBank corpus, consisting of 240 narrative samples from patients with Alzheimer’s, and 233 from a healthy control group. The two groups were matched to have similar age, gender, and education levels. Each participant was asked to describe the Boston Cookie Theft picture above.

Fraser’s analysis used both the original audio data, as well as a detailed computer-readable transcript. She looked at 370 different features covering all sorts of linguistic metrics, like ratios of different parts of speech, syntactic structures, vocabulary richness, and repetition. Then, she performed a factor analysis and identified a set of 35 features that achieves about 81% accuracy in distinguishing between Alzheimer’s patients and controls.

According to the analysis, a few of the most important distinguishing features are:

  • Pronoun to noun ratio. Alzheimer’s patients produce vague statements and tend to substitute pronouns like “he” for nouns like “the boy”. This also applies to adverbial constructions like “the boy is reaching up there” rather than “the boy is reaching into the cupboard”.
  • Usage of high frequency words. Alzheimer’s patients have difficulty remembering specific words and replace them with more general, therefore higher frequency words.

Future directions

Shortly after this research was published, my adviser Frank Rudzicz co-founded WinterLight Labs, a company that’s working on turning this proof-of-concept into an actual usable product. It also diagnoses various other cognitive disorders like Primary Progressive Aphasia.

A few other grad students in my research group are working on Talk2Me, which is a large longitudinal study to collect more data from patients with various neurodegenerative disorders. More data is always helpful for future research.

So this is the starting point for my research. Stay tuned for updates!