This blog post is a summary of my work from earlier this year: Dropout during inference as a model for neurological degeneration in an image captioning network.
For a long time, deep learning has had an interesting connection to neuroscience. The definition of the neuron in neural networks was inspired by early models of the neuron. Later, convolutional neural networks were inspired by the structure of neurons in the visual cortex. Many other models also drew inspiration from how the brain functions, like visual attention which replicated how humans looked at different areas of an image when interpreting it.
The connection was always a loose and superficial, however. Despite advances in neuroscience about better models of neurons, these never really caught on among deep learning researchers. Real neurons obviously don’t learn by gradient back-propagation and stochastic gradient descent.
In this work, we study how human neurological degeneration can have a parallel in the universe of deep neural networks. In humans, neurodegeneration can occur by several mechanisms, such as Alzheimer’s disease (which affects connections between individual neurons) or stroke (in which large sections of brain tissue die). The effect of Alzheimer’s disease is dementia, where language, motor, and other cognitive abilities gradually become impaired.
To simulate this effect, we give our neural network a sort of dementia, by interfering with connections between neurons using a method called dropout.
Yup, this probably puts me high up on the list of humans to exact revenge in the event of an AI apocalypse.
The Model
We started with an encoder-decoder style image captioning neural network (described in this post), which looks at an image and outputs a sentence that describes it. This is inspired by a picture description task that we give to patients suspected of having dementia: given a picture, describe it in as much detail as possible. Patients with dementia typically exhibit patterns of language different from healthy patients, which we can detect using machine learning.
To simulate neurological degeneration in the neural network, we apply dropout in the inference mode, which randomly selects a portion of the neurons in a layer and sets their outputs to zero. Dropout is a common technique during training to regularize neural networks to prevent overfitting, but usually you turn it off during evaluation for the best possible accuracy. To our knowledge, nobody’s experimented with applying dropout in the evaluation stage in a language model before.
We train the model using a small amount of dropout, then apply a larger amount of dropout during inference. Then, we evaluate the quality of the sentences produced by BLEU-4 and METEOR metrics, as well as sentence length and similarity of vocabulary distribution to the training corpus.
Results
When we applied dropout during inference, the accuracy of the captions (measured by BLEU-4 and METEOR) decreased with more dropout. However, the vocabulary generated was more diverse, and the word frequency distribution was more similar (measured by KL-divergence to the training set) when a moderate amount of dropout was applied.
When the dropout was too high, the model degenerated into essentially generating random words. Here are some examples of sentences that were generated, at various levels of dropout:
Qualitatively, the effects of dropout seemed to cause two types of errors:
- Caption starts out normally, then repeats the same word several times: “a small white kitten with red collar and yellow chihuahua chihuahua chihuahua”
- Caption starts out normally, then becomes nonsense: “a man in a baseball bat and wearing a uniform helmet and glove preparing their handles won while too frown”
This was not that similar to speech produced by people with Alzheimer’s, but kind of resembled fluent aphasia (caused by damage to the part of the brain responsible for understanding language).
Challenges and Difficulties
Excited with our results, we submitted the paper to EMNLP 2018. Unfortunately, our paper was rejected. Despite the novelty of our approach, the reviewers pointed out that our work had some serious drawbacks:
- Unclear connection to neuroscience. Adding dropout during inference mode has no connections to any biological models of what happens to the brain during atrophy.
- Only superficial resemblance to aphasic speech. A similar result could have been generated by sampling words randomly from a dictionary, without any complicated RNN models.
- Not really useful for anything. We couldn’t think of any situations where this model would be useful, such as detecting aphasia.
We decided that there was no way around these roadblocks, so we scrapped the idea, put the paper up on arXiv and worked on something else.
For more technical details, refer to our paper:
- Li, Bai, Ran Zhang, and Frank Rudzicz. “Dropout during inference as a model for neurological degeneration in an image captioning network.” arXiv preprint arXiv:1808.03747(2018).