Why Time Management in Grad School is Difficult

Graduate students are often stressed and overworked; a recent Nature report states that grad students are six times more likely to suffer from depression than the general population. Although there are many factors contributing to this, I suspect that a lot of it has to do with poor time management.

In this post, I will describe why time management in grad school is particularly difficult, and some strategies that I’ve found helpful as a grad student.


As a grad student, I’ve found time management to be far more difficult than either during my undergraduate years as well as working in the industry. Here are a few reasons why:

  1. Loose supervision: as a grad student, you have a lot of freedom over how you spend your time. There are no set hours, and you can go a week or more without talking to your adviser. This can be both a blessing and a curse: some find the freedom liberating while others struggle to be productive. In contrast, in an industry job, you’re expected to report to daily standup, you get assigned tickets each sprint, so others essentially manage your time for you.
  2. Few deadlines: grad school is different from undergrad in that you have a handful of “big” deadlines a year (eg: conference submission dates, major project due dates), whereas in undergrad, the deadlines (eg: assignments, midterms) are smaller and more frequent.
  3. Sparse rewards: most of your experiments will fail. That’s the nature of research — if you know it’s going to work, then it’s no longer research. It’s hard to not get discouraged when you struggle for weeks without getting a positive result, and start procrastinating on a multitude of distractions.

Basically, poor time management leads to procrastination, stress, burnout, and generally having a bad time in grad school 😦


Some time management strategies that I’ve found to be useful:

  1. Track your time. When I first started doing this, I was surprised at how much time I spent doing random, half-productive stuff not really related to my goals. It’s up to you how to do this — I keep a bunch of Excel spreadsheets, but some people use software like Asana.
  2. Know your plan. My adviser suggested a hierarchical format with a long-term research agenda, medium-term goals (eg: submit a paper to ICML), and short-term tasks (eg: run X baseline on dataset Y). Then you know if you’re progressing towards your goals or merely doing stuff tangential to it.
  3. Focus on the process, not the reward. It’s tempting to celebrate when your paper gets accepted — but the flip side is you’re going to be depressed if it gets rejected. Your research will have have many failures: paper rejections and experiments that somehow don’t work. Instead, celebrate when you finish the first draft of your paper; reward yourself when you finish implementing an algorithm, even if it fails to beat the baseline.

Here, I plotted my productive time allocation in the last 6 months:

time_allocation.png

Most interestingly, only a quarter of my time is spent coding or running experiments, which seems to be much less than most grad students. I read a lot of papers to try to avoid reinventing things that others have already done.

On average, I spend about 6 hours a day doing productive work (including weekends) — a quite reasonable workload of about 40-45 hours a week. Contrary to some perceptions, grad students don’t have to be stressed and overworked to be successful; allowing time for leisure and social activities is crucial in the long run.

Books I’ve read in 2018

I read 28 books in 2018 (about one every 2 weeks). Recently, I’ve been getting into the habit of taking notes in the margins and writing down a summary of what I learned after finishing them.

This blog post is a more-or-less unedited dump of some of my notes on some of the books I read last year. They were originally notes for myself and weren’t meant to be published, so a lot of ideas aren’t very well fleshed out. Without further ado, let’s begin.


Understanding Thermodynamics by H. C. Van Ness

Understanding Thermodynamics (Dover Books on Physics)

Pretty short, 100 page book that gives an intuitive introduction to various topics in thermodynamics and statistical mechanics. It’s meant to be a supplementary text, not a main text, so some really important things were omitted, which was confusing to me, since I’ve never studied this topic before. Some ideas I learned:

  • Energy can’t really be defined since it’s not a physical property. Can only write it as a sum of a bunch of things, and note that within a closed system, it always stays the same (first law of thermodynamics).
  • A process is reversible if you can do it in reverse to get back the initial state. No physical process is perfectly reversible, but closer it is to reversible, the more efficient it is.
  • Heat engines convert a heat differential into work. Two types are the Otto cycle (used in cars) and the Carnot cycle. Surprisingly, heat engines cannot be perfectly efficient, even under ideal conditions; the Carnot limit puts an upper bound. A heat engine that perfectly converts heat into work violates the second law of thermodynamics.
  • Second law of thermodynamics says that entropy always increases; moreover, it increases for irreversible processes and remains the same for reversible processes. This is useful for determining when a “box of tricks” (taking in compressed air, outputting cold air at one end and hot air at the other end) is possible. The book doesn’t give much intuition about why the definition of entropy makes sense though, it literally tries random combinations of variables until one “works” (gives a constant value experimentally).
  • Second law of thermodynamics is merely an empirical observation, and can’t be proved. In fact, it can be challenged at the molecular level (eg: Maxwell’s demon) which isn’t easily refutable.
  • Statistical mechanics gives an alternate definition of entropy in terms of molecular states, and from it, you can derive various macroscopic properties like temperature and pressure. However, it only works well for ideal gases, and doesn’t quite explain or replace thermodynamics.

Indian Horse by Richard Wagamese

Indian Horse: A Novel

This book is about the life of an Ojibway Indian, living in northern Ontario and growing up in the 60s. When he was young, they sent him to a residential school where he was badly treated and not allowed to speak his own language. He found hockey and got really good at it, but faced problems with racism so he couldn’t really make it in the big leagues with white people. Later, he faced more racism in his job as a logger. Eventually, he developed an alcohol addiction due to this disillusionment and finally comes to terms with his life.

Very interesting perspective on the indigenous people of Canada, a group that most of us don’t think about often. Despite numerous government subsidies, they’re still some of the poorest people in the country, with low education levels. Some people think it’s laziness, but they’ve had a history of mistreatment in residential schools and were subjected to racism until very recently, so it’s difficult for them to integrate into society. Their reserves are often a long distance from major population centers, which means very few opportunities. Furthermore, their culture doesn’t really value education. Overall, great read about a group currently marginalized in Canadian society.

The Power of Habit by Charles Duhigg

The Power of Habit: Why We Do What We Do in Life and Business

Book that discusses various aspects of how habits work. On a high level, habits have three components: cue, routine, and reward. The cue is a set of conditions, such that you automatically perform a routine in order to get a reward. After a while, you will crave the reward when given the cue, and perform the routine automatically (even if the reward is intermittent).

To change a habit, you can’t just force yourself not to do it, because you will constantly crave the reward. Instead, replace the routine with something else that gives a similar reward but is less harmful. Forcing yourself to do something against habit depletes your willpower, so it’s much better to change the habit, so you do it automatically and retain your willpower.

Large changes are often precipitated by a small “keystone” habit change that catalyze a series of systemic changes. For example, Alcoa, an aluminum company, improved its overall efficiency when it decided to focus on safety. Sometimes a disaster is needed to bring about an systemic change in an organization, like a fire in King’s Cross station or operating on the wrong side of a patient in a hospital. Peer pressure is important, for example it’s a key component in Alcoholics Anonymous and making the black civil rights movement go through.

Overall, pretty interesting read, although I think there’s too much dramatic storytelling and anecdotes; I would’ve preferred more scientific discussion and a bit less storytelling.

Why We Sleep by Matthew Walker

Why We Sleep: Unlocking the Power of Sleep and Dreams

This book gives a comprehensive scientific overview of sleep. Although there are still many unanswered questions, there’s been a lot of research lately and this book sums it up.

Sleep is a very necessary function of life. Every living organism requires it, although in different amounts, and total lack of sleep very quickly leads to death. However it’s still unclear exactly why sleep is so important.

There are two types of sleep: REM (rapid eye movement) and NREM sleep. REM sleep is a much lighter form of sleep where you’re closer to the awake state, and is also when you dream; NREM is a much deeper sleep. You can distinguish the type of sleep easily by measuring brain waves.

Sleep deprivation is really bad. You don’t even need total deprivation, even six hours of sleep a day for a few nights is as bad as pulling an all-nighter. When you’re sleep deprived, you’re a lot worse at learning things, controlling your emotions, and you’re also more likely to get sick and more susceptible to cancer.

Dreams aren’t that well understood, but they seem to consolidate memories, including moving them from short term to long term storage. REM sleep especially lets your brain find connections between different ideas, and you’re better at problem solving immediately after.

Insomnia is a really common problem in our society, in part due to it being structured to encourage sleeping less. Sleeping pills are ineffective at best (prescription ones like Ambien and Benzodiazepines are actually really harmful), the recommended treatment is behavioral, like sleeping in a regular schedule, avoiding caffeine and nicotine and alcohol, don’t take naps, avoid light in the bedroom.

My parents always told me it’s bad to stay up so late, but science doesn’t really support this. Different people have different chronotypes, which are determined by genetics (and somewhat changes by age). It’s okay to sleep really late, as long as you maintain a consistent sleep schedule.

Overall I learned a lot from this book but it’s a fairly dense read, with lots of information about different topics, and it took me over a month to finish it.

Notes from the Underground by Fyodor Dostoyevsky

Notes From The Underground

I read this Dostoyevsky book because it had an interesting plot of a man who tries to rescue a prostitute. It turns out that the rescuing prostitute part is not really the central event of the book, but nevertheless I found it quite interesting. The novella is short enough (90 pages) unlike Dostoyevsky’s other books which are super long. It explores a lot of philosophical and psychological ideas in an interesting setting.

The unnamed narrator is a man from the “underground” — he is some kind of civil servant, middle aged, and has health problems. He rejects the idea that man must do the rational thing, as then he is like a machine. He rejoices in doing stupid things from time to time, just because he feels like it, then he can retain some of his humanity. In the second part of the book, the narrator feels like he is not seen as equal by his peers, and goes to extreme lengths to remedy it. He forcefully invites himself to a dinner party with old friends, and is dismayed that his social status is so low that he’s just ignored. He would much rather have a fight than be ignored, and tries to provoke a fight in an autistic manner. Later he meets a prostitute Liza, whom he offers to save. However, when she actually shows up at his place, he is stuck in his own world and lectures to her about the virtues of morality, without actually helping her.

The narrator feels surreal, kind of like valuing social acceptance to an extreme degree. After all, the narrator is physically well-off, he is at least rich enough to hire one servant. However, as long as he feels inferior to his peers, he is frustrated. Also, the more he tries to gain respect from his peers, the more his efforts backfire and his position is lowered in their eyes. Social recognition isn’t something you should pursue directly.

Factfulness by Hans Rosling

Factfulness: Ten Reasons We're Wrong About the World--and Why Things Are Better Than You Think

This book was written by Hans Rosling (the same guy that made The Joy of Stats documentary) just before he died in 2017. It uses stats to show that despite what the media portrays, and despite popular conception, the world is not such a bad place. Extreme poverty is on the decline, children are being vaccinated, women are going to school.

At the beginning of the book, he gives a quiz of 13 questions. Most people score terribly, worse than random chance, by consistently guessing that the world is worse than it actually is. Without looking at stats, it’s easy to be systematically mislead and fall into a bunch of falacies like not considering magnitude of effects, generalizing your experience to others, or acting based on fear. Maybe because of my stats background, a lot of what he says is quite obvious to me. Also I scored 9 on the quiz, which is higher than pretty much everyone. It confirmed some stuff that I already knew, but still it had good insights on poverty and developing nations.

A big takeaway for me is to be thankful of what we have, seeing the difference of lives in levels 1-3. Canada is a level 4 country (where people spend more than $32 dollars a day) yet people make fun of me for making 20k/year “poverty” grad school wages. Grad students in Canada should be thankful that we have electricity, running water, can eat out at restaurants, and not sad that we can’t afford luxury cars and condos.

Sky Burial by Xinran Xue

Sky Burial by Xinran (2005) Paperback

In this novel, a Chinese women, Shu Wen from Suzhou, travels to Tibet to search for her missing husband. This was in 1958, when the Chinese Communist Party annexed Tibet. On the way there, she picks up a Tibetan woman, Zhuoma. They get into some trouble in the mountains and meet a Tibetan family, and gradually Wen integrates into the Tibetan culture and learns the language and customs. Time passes by quickly and before you realize it, 30 years has passed while they have practically no information from the outside world. In the end, Wen does find out what happened to her husband through his diaries, but it’s a bittersweet sort of ending as her world is changed unrecognizably and her husband is dead.

The author makes it ambiguous whether this is a work of fiction or it actually happened — all the facts seem believable, other than somehow not finding out about the great famine and cultural revolution for decades. A lot of interesting Tibetan customs are explained: their nomadic lifestyle, polyamorous family structure, buddhist religious beliefs, and their practice of sky burial which lets vultures eat their dead. The relationship between the Chinese and Tibetan has always been a contentious one, and in this book they form a connection of understanding between the two ethnic groups.

Tibet seems like a really interesting place that I should visit someday. However, it’s unclear how much of their traditional culture is still accessible, due to the recent Han Chinese migrations. Also, it’s currently impossible to travel freely in Tibet without a tour group if you’re not a Chinese citizen.

Getting to YES by Fisher, Ury, and Patton

Getting to Yes: Negotiating Agreement Without Giving In

This book tells you how to negotiate more effectively. A common negotiating mistake is to use positional negotiation, which is each side picking an arbitrary position (eg: buy the car for $5000), and going back and forth until you’re tired and agree, or you both walk out. Positional negotiation is highly arbitrary, and often leads to no agreement, which is bad for both parties.

Some ways to negotiate in a more principled way:

  • Emphasize with the other party, get to know them and their values, treat it as both parties against a common problem rather than you trying to “win” the negotiation.
  • Focus on interests, rather than positions. During the negotiation, figure out what each party really wants; sometimes, it’s possible to give them something that’s valuable for them but you don’t really care about. Negotiation is a nonzero sum game, so try to find creative solutions that fulfill everybody’s interests, rather than fight over a one-dimensional figure.
  • When creative solutions are not possible (both sides just want money), defer to objective measures like industry standards. This gives you both an anchor to use, rather than negotiating in a vacuum.
  • Be aware of your and the other party’s BATNA: best alternative to negotiated agreement. This determines who holds more power in a negotiation, and improving it is a good way to get more leverage.

Trump: A Graphic Biography by Ted Rall

Trump: A Graphic Biography

A biography of Trump in graphical novel format. This book was written after Trump won the republican primaries (May 2016) but before he won the presidency (Nov 2016).

First, the book describes the political and economic circumstances that led to Trump coming into power. After the 2008 financial crisis, many low-skilled Americans felt like there was little economic opportunity for them. Many politicians had come and gone, promising change, but nothing happened. For them, Trump represented a change from the political establishment. They didn’t necessarily agree with all of his policies, they just wanted something radical.

Trump was born after WW2 to a wealthy family in New York City. He studied economics and managed a real estate empire for a few decades, which made him a billionaire. Through his deals in real estate, he proved himself a cunning and ruthless negotiator who is willing to behave unethically and use deception to get what he wanted.

This was a good read because most of my friend group just thinks Trump is “stupid”, and everyone who voted for him is stupid. I never really understood why he was so popular among the other demographic. As a biography, the graphic novel format is good because it’s much shorter; most other biographies go into way too much detail about a single person’s life than I care to know about.

12 Rules for Life by Jordan Peterson

Jordan Peterson’s new book that quickly hit #1 on the bestsellers lists after being released this year. He’s famous around UofT for speaking out against social justice warriors, but I later found out that he has a lot of YouTube videos on philosophy of how to live your life. This book summarizes a lot of these ideas into a single book form, in the form of 12 “rules” to live by, in order to live a good and meaningful life.

These ideas are the most interesting and novel to me:

  • Dominance hierarchy: humans (especially men) instinctively place each other on a hierarchy, where the person at the top has all the power and status, and gets all the resources. Women want to date guys near the top of the hierarchy, and men near the top get many women easily while men at the bottom can’t even find one. Therefore, it’s essential to rise to the top of the dominance hierarchy.
  • Order and chaos: order is the part of the world that we understand, that behaves according to rules; chaos is the unknown, risk, failure. To live a meaningful life is to straddle the boundary between order and chaos, and have a little bit of both.
  • When raising children, it’s the parents’ responsibility to educate them how to behave properly to follow social norms, because otherwise, society will treat them harshly and this will snowball into social isolation later in life. Also, they should be encouraged to do risky things (within reason) to explore / develop their masculinity.

Some of the other rules are more obvious. Examples include: be truthful to yourself, choose your friends wisely, improve yourself incrementally rather than comparing yourself to others, confront issues quickly as they arise. I guess depending on your personality and prior experience, you might find a different subset of these rules to be obvious.

Initially, I found JP to be obnoxious because of the lack of scientific rigour in his arguments, he just seems convincing because he’s well-spoken. The book does a slightly better job than the videos in substantiating the arguments and citing various psychology research papers. JP also has a tendency to cite literature; when he goes into stuff like bible archetypes of Christ, or Cain/Abel, then I have no idea what he’s talking about anymore. The book felt a bit long. Overall still a good read, I learned a lot from this book and also by diving deeper into the psychology papers he cited.

Analects by Confucius

The Analects of Confucius: A Philosophical Translation (Classics of Ancient China)

The Analects (论语) is a book of philosophy by Confucius and lays down the groundwork for much of Chinese thinking for the next 2500 years. It’s the second book I’ve read in ancient Chinese literature after the Art of War. It’s written in a somewhat different style — it has 20 chapters of varying lengths, but the chapters aren’t really organized by topic and the writing jumps around a lot.

Confucius tells you how to live your life not by appeal to religion, but rather by showing characteristics that he considers “good”, and gives examples of what is and what isn’t considered good. A few reoccuring ideas:

  • junzi 君子 – exemplary person. The ideal, wise person that we should strive to be. A junzi strives to be excellent (德) and honorable (信), and not be arrogant or greedy or materialistic. He seeks knowledge, respects elders, is not afraid to speak up, and conducts himself authoratatively.

  • li 礼- ritual propriety. The idea that there are certain “rituals” that society observes, and that if a leader respects them, then things will go smoothly. Kind of like the “meta” in games — modern examples would be the employer/employee relationship, or what situations do you perform a handshake with someone.

  • xiao 孝 – filial responsibility. A son must respect his parents and take care of them in old age, and mourn for them for three years after their death (since for three years after birth, a child is helpless unless for his parents).

  • haoxue 好学 – love of learning for the sake of learning

  • ren 仁 – authorative conduct / benevolence / humanity. Basically a leader should conduct himself in a responsible manner, be fair yet firm.

  • dao 道 – the way. One should forge one’s path through life.

An obvious question is why should we listen to Confucius if there’s no appeal either to a higher power (like the bible) or by axiomizing everything. I don’t really know, but many Chinese have studied this book and lived their lives according to its principles, so by studying it, we can better understand how Chinese think.

I feel like the Analects tells us how an ideal Chinese is “supposed” to think, but modern Chinese people are very much the opposite. Modern Chinese people are generally very materialistic, competitive, and care about comparing themselves to people around them. A friend said much of what is written here is “obvious” to any Chinese person — but then why don’t they actually follow it? I guess modern Chinese society is very unequal, and one must be competitive to rise to the top to prosper. So the cynical answer is that recent economic forces override thousand-year philosophy, which is the ideal, but falls apart when push comes to shove.

The Analects is a very thought-provoking book. It’s surprising how many things Confucius said 2500 years ago is still true today. I probably missed a lot of things in my first pass through it — but this is a good starting point for further reading on Chinese philosophy and literature.

Pachinko by Min Jin Lee

Pachinko (National Book Award Finalist)

Pachinko is the name of the Japanese pinball game, where you watch metal balls tumble through a machine. It’s also the name of this novel, that traces a Korean family in Japan through four generations (Yangjin/Hoonie/Hansu -> Sunja/Isak -> Noa/Mozasu -> Solomon/Phoebe). Sunja is the first generation to immigrate to Japan during the 1930s, after being tricked by a rich guy who got her pregnant. Afterwards, they make their livelihoods in Japan, but they are always considered outsiders, despite being in the country for many generations.

It’s surprising to see so much racism in Japan towards Koreans, since Canada is so multicultural and so accepting of people from other places. Japan is very different: even after four generations in Japan, a Korean boy is still considered a guest and must register with the government every few years or risk getting deported. The Koreans in Japan can’t work the same jobs as the Japanese, can’t legally rent property, and get bullied at school, so they end up working in pachinko parlors, which the Japanese consider “dirty”. All the Korean men: Mozasu, Noa, and Solomon end up working in pachinko, hence the name of the book.

One thing that struck me was how so many of the characters valued idealism more than rationality. Yoseb doesn’t want his wife to go out to work because he considers it improper. Sunja and Noa don’t want to accept Hansu’s help because of shame, even though they could have benefitted a lot, materially. All the Christians have this sort of idealist irrationality, which I guess is part of being religious — only Hansu behaves in a way that makes sense to me. This book gets a bit slow in the end as there are too many minor characters, but is overall a thought provoking read about racism in Japanese society.

Visual Intelligence by Amy Herman

Visual Intelligence: Sharpen Your Perception, Change Your Life

This book uses art to teach you to notice your surroundings more, which is very interesting. The basic premise is there’s a lot of things that we miss, but can be quite important. The two biggest ideas in this book for me:

  1. Train yourself to be more visually perceptive by looking at art, and trying to notice every detail. This seems trivial but often we miss things. Now in the real world, do the same thing and see things in a different way.

  2. Our experiences shape how we perceive things, so it’s important to describe things objectively rather than subjectively. Do not make assumptions, rather, describe only the facts of what you see. From a picture you can’t infer a person is “homeless”, but rather that he’s “lying on a street next to a shopping cart”.

Memoirs of a Geisha by Arthur Golden

Memoirs of a Geisha (Vintage Contemporaries)

This novel tells the story of the geisha Sayuri, from her childhood until her death. It pretends to be a real memoir, but it’s written by an American man. The facts are thoroughly researched, so we get a feel of what Kyoto was like before the war.

Essentially, society in Japan was very unequal — the women have to go through elaborate rituals and endure a lot of suffering to please the men, who just have a lot of money. However, even without formal power, the geishas like Mameha and Hatsumomo construct elaborate schemes of deceit and trickery.

The plot was exciting to read, but certain characters felt flat. Sayuri’s infatuation for the chairman for decades doesn’t seem believable — maybe I would’ve had a crush like that as a teenager, but certainly a woman in her late 20s should know better. Hatsumomo’s degree of evilness didn’t seem convincing either.

Lastly, having read some novels by actual Japanese authors, this book feels nothing like them. Japanese literature is a lot more mellow, and the characters more reserved: certainly nobody would act in such an obviously evil manner. Japanese novels also typically have themes of loneliness and isolation and end with people committing suicide, which doesn’t happen in this novel either.

 

Deep Learning for NLP: SpaCy vs PyTorch vs AllenNLP

Deep neural networks have become really popular nowadays, producing state-of-the-art results in many areas of NLP, like sentiment analysis, text summarization, question answering, and more. In this blog post, we compare three popular NLP deep learning frameworks: SpaCy, PyTorch, and AllenNLP: what are their advantages, disadvantages, and use cases.

SpaCy

Pros: easy to use, very fast, ready for production

Cons: not customizable, internals are opaque

spacy_logo.jpg

SpaCy is a mature and batteries-included framework that comes with prebuilt models for common NLP tasks like classification, named entity recognition, and part-of-speech tagging. It’s very easy to train a model with your data: all the gritty details like tokenization and word embeddings are handled for you. SpaCy is written in Cython which makes it faster than a pure Python implementation, so it’s ideal for production.

The design philosophy is the user should only worry about the task at hand, and not the underlying details. If a newer and more accurate model comes along, SpaCy can update itself to use the improved model, and the user doesn’t need to change anything. This is good for getting a model up and running quickly, but leaves little room for a NLP practitioner to customize the model if the task doesn’t exactly match one of SpaCy’s prebuilt models. For example, you can’t build a classifier that takes both text, numerical, and image data at the same time to produce a classification.

PyTorch

Pros: very customizable, widely used in deep learning research

Cons: fewer NLP abstractions, not optimized for speed

pytorch_logo.jpeg

PyTorch is a deep learning framework by Facebook, popular among researchers for all kinds of DL models, like image classifiers or deep reinforcement learning or GANs. It uses a clear and flexible design where the model architecture is defined with straightforward Python code (rather than TensorFlow’s computational graph design).

NLP-specific functionality, like tokenization and managing word embeddings, are available in torchtext. However, PyTorch is a general purpose deep learning framework and has relatively few NLP abstractions compared to SpaCy and AllenNLP, which are designed for NLP.

AllenNLP

Pros: excellent NLP functionality, designed for quick prototyping

Cons: not yet mature, not optimized for speed

allennlp_logo.jpg

AllenNLP is built on top of PyTorch, designed for rapid prototyping NLP models for research purposes. It supports a lot of NLP functionality out-of-the-box, like text preprocessing and character embeddings, and abstracts away the training loop (whereas in PyTorch you have to write the training loop yourself). Currently, AllenNLP is not yet at a 1.0 stable release, but looks very promising.

Unlike PyTorch, AllenNLP’s design decouples what a model “does” from the architectural details of “how” it’s done. For example, a Seq2VecEncoder is any component that takes a sequence of vectors and outputs a single vector. You can use GloVe embeddings and average them, or you can use an LSTM, or you can put in a CNN. All of these are Seq2VecEncoders so you can swap them out without affecting the model logic.

The talk “Writing code for NLP Research” presented at EMNLP 2018 gives a good overview of AllenNLP’s design philosophy and its differences from PyTorch.

Which is the best framework?

It depends on how much you care about flexibility, ease of use, and performance.

  • If your task is fairly standard, then SpaCy is the easiest to get up and running. You can train a model using a small amount of code, you don’t have to think about whether to use a CNN or RNN, and the API is clearly documented. It’s also well optimized to deploy to production.
  • AllenNLP is the best for research prototyping. It supports all the bells and whistles that you’d include in your next research paper, and encourages you to follow the best practices by design. Its functionality is a superset of PyTorch’s, so I’d recommend AllenNLP over PyTorch for all NLP applications.

There’s a few runner-ups that I will mention briefly:

  • NLTK / Stanford CoreNLP / Gensim are popular libraries for NLP. They’re good libraries, but they don’t do deep learning, so they can’t be directly compared here.
  • Tensorflow / Keras are also popular for research, especially for Google projects. Tensorflow is the only framework supported by Google’s TPUs, and it also has better multi-GPU support than PyTorch. However, multi-GPU setups are relatively uncommon in NLP, and furthermore, its computational graph model is harder to debug than PyTorch’s model, so I don’t recommend it for NLP.
  • PyText is a new framework by Facebook, also built on top of PyTorch. It defines a network using pre-built modules (similar to Keras) and supports exporting models to Caffe to be faster in production. However, it’s very new (only released earlier this month) and I haven’t worked with it myself to form an opinion about it yet.

That’s all, let me know if there’s any that I’ve missed!

The Ethics of (not) Tipping at Restaurants

A customer finishes a meal at a restaurant. He gives a 20-dollar bill to the waiter, and the waiter returns with some change. The customer proceeds to pocket the change in its entirety.

“Excuse me sir,” the waiter interrupts, “but the gratuity has not been included in your bill”

The customer nods and calmly smiles at the waiter. “Yes, I know,” he replies. He gathers his belongings and walks out, indifferent to the astonished look on the waiter’s face.

notip.png

This fictional scenario makes your blood boil just thinking about it. It evokes a feeling of unfairness, where a shameless and rude customer has cheated an innocent, hardworking waiter out of his well-deserved money. Not many situations provoke such a strong emotional response, yet still be perfectly legal.

There is compelling reason not to tip. On an individual level, you can save 10-15% on your meal. On a societal level, economists have criticized tipping for its discriminatory effects. Yet we still do it, but why?

In this blog post, we look at some common arguments in favor of tipping, but we see that these arguments may not hold up to scrutiny. Then, we examine the morality of refusing to tip under several ethical frameworks.

Arguments in favor of tipping (and their rebuttals)

Here are four common reasons for why we should tip:

  1. Tipping gives the waiter an incentive to provide better service.
  2. Waiters are paid less than minimum wage and need the money.
  3. Refusing to tip is embarrassing: it makes you lose face in front of the waiter and your colleagues.
  4. Tipping is a strong social norm and violating it is extremely rude.

I’ve ordered these arguments from weakest to strongest. These are good reasons, but I don’t think any of them definitively settles the argument. I argue that the first two are factually inaccurate, and for the last two, it’s not obvious why the end effect is bad.

Argument 1: Tipping gives the waiter an incentive to provide better service. Since the customer tips at the end of the meal, the waiter does a better job to make him happy, so that he receives a bigger tip.

Rebuttal: The evidence for this is dubious. One study concluded that service quality has at most a modest correlation with how much people tip; many other factors affected tipping, like group size, day of week, and amount of alcohol consumed. Another study found that waitresses earned more tips from male customers if they wore red lipstick. The connection between good service and tipping is sketchy at best.

Argument 2: Waiters are paid less than minimum wage and need the money. In many parts of the USA, waiters earn a base rate of about $2 an hour and must rely on tips to survive.

Rebuttal: This is false. In Canada, all waiters earn at least minimum wage. In the USA, the base rate for waiters is less than minimum wage in some states, but restaurants are required to pay the difference if they make less than minimum wage after tips.

You may argue that restaurant waiters are poor and deserve more than minimum wage. I find this unconvincing as we there are lots of service workers (cashiers, janitors, retail clerks, fast food workers) that do strenuous labor and make minimum wage, and we don’t tip them. I don’t see why waiters are an exception. Arguably Uber drivers are the most deserving of tips, since they make less than minimum wage after accounting for costs, but tipping is optional and not expected for Uber rides.

Argument 3: Refusing to tip is embarrassing: it makes you lose face in front of the waiter and your colleagues. You may be treated badly the next time you visit the restaurant and the waiter recognizes you. If you’re on a date and you get confronted for refusing to tip, you’re unlikely to get a second date.

Rebuttal: Indeed, the social shame and embarrassment is a good reason to tip, especially if you’re dining with others. But what if you’re eating by yourself in a restaurant in another city that you will never go to again? Most people will still tip, even though the damage to your social reputation is minimal. So it seems that social reputation isn’t the only reason for tipping.

It’s definitely embarrassing to get confronted for not tipping, but it’s not obvious that being embarrassed is bad (especially if the only observer is a waiter who you’ll never interact with again). If I give a public speech despite feeling embarrassed, then I am praised for my bravery. Why can’t the same principle apply here?

Argument 4: Tipping is a strong social norm and violating it is extremely rude. Stiffing a waiter is considered rude in our society, even if no physical or economic damage is done. Giving the middle finger is also offensive, despite no clear damage being done. In both cases, you’re being rude to an innocent stranger.

Rebuttal: Indeed, the above is true. A social norm is a convention that if violated, people feel rude. The problem is the arbitrariness of social norms. Is it always bad to violate a social norm, or can the social norm itself be wrong?

Consider that only a few hundred years ago, slavery was commonplace and accepted. In medieval societies, religion was expected and atheists were condemned, and in other societies, women were considered property of their husbands. All of these are examples of social norms; all of these norms are considered barbaric today. It’s not enough to justify something by saying that “everybody else does it”.

Tipping under various ethical frameworks

Is it immoral not to tip at restaurants? We consider this question under the ethical frameworks of ethical egoism, utilitarianism, Kant’s categorical imperative, social contract theory, and cultural relativism.

trolley.pngAbove: The trolley problem, often used to compare different ethical frameworks, but unlikely to occur in real life. Tipping is a more quotidian situation to apply ethics.

1) Ethical egoism says it is moral to act in your own self-interest. The most moral action is the one that is best for yourself.

Clearly, it is in your financial self-interest not to tip. However, the social stigma and shame creates negative utility, which may or may not be worth more than the money saved from tipping. This depends on the individual. Verdict: Maybe OK.

2) Utilitarianism says the moral thing to do is maximize the well-being of the greatest number of people.

Under utilitarianism, you should tip if the money benefits the waiter more than it would benefit you. This is difficult to answer, as it depends on many things, like your relative wealth compared to the waiter’s. Again, subtract some utility for the social stigma and shame if you refuse to tip. Verdict: Maybe OK.

3) Kant’s categorical imperative says that an action is immoral if the goal of the action would be defeated if everyone started doing it. Essentially, it’s immoral to gain a selfish advantage at the expense of everyone else.

If everyone refused to tip, then the prices of food in restaurants would universally go up to compensate, which negates the intended goal of saving money in the first place. Verdict: Not OK.

4) Social contract theory is the set of rules that a society of free, rational people would agree to obey in order to benefit everyone. This is to prevent tragedy of the commons scenarios, where the system would collapse if everyone behaved selfishly.

There is no evidence that tipping makes a society better off. Indeed, many societies (eg: China, Japan) don’t practice tipping, and their restaurants operate just fine. Verdict: OK.

5) Cultural relativism says that morals are determined by the society that you live in (ie, social norms). There is a strong norm in our culture that tipping is obligatory in restaurants. Verdict: Not OK.

Conclusion

In this blog post, we have considered a bunch of arguments for tipping, and examined it under several ethical frameworks. Stiffing the waiter is a legal method of saving some money when eating out. There is no single argument that shows it’s definitely wrong to do this, and some ethical frameworks consider it acceptable while some don’t. This is often the case in ethics when you’re faced with complicated topics.

However, refusing to tip has several negative effects: rudeness of violating a strong social norm, feeling of embarrassment to yourself and colleagues, and potential social backlash. Furthermore, it violates some ethical systems. Therefore, one should reconsider if saving 10-15% at restaurants by not tipping is really worth it.

How to read research papers for fun and profit

One skill that I’ve learned after a year in grad school is how to effectively read research papers. Previously I had found them impenetrable, but now I find them a great source of information about cutting-edge science while it is being done and before it’s made its way into textbooks. Now I read about 4-5 of them every week.

My research area is natural language processing and machine learning, but I read papers in lots of fields, not just in AI and computer science. Papers are my go-to source for a myriad of scientific inquiries, for example: does drinking alcohol cause cancer? Are women more talkative than men? Was winter in Toronto abnormally cold this year? Etc.

Why read scientific papers?

If you try to Google questions like these, you typically end up on Wikipedia or some random article on the internet. Research papers are an underutilized resource that have several advantages over other common sources of information on the internet.

Advantages over articles on the internet: no matter what topic, you will undoubtedly find articles on it on the internet. Some of these articles are excellent, but others are opinionated nonsense. Without being an expert yourself, it can be difficult to decide what information to trust. Peer-reviewed research papers are held to a much higher minimum quality standard, and for every claim they make, they have to clearly state their evidence, assumptions, how they arrived at the conclusion, and their degree of confidence in their result. You can examine the paper for yourself and decide if the assumptions are reasonable and the conclusions follow logically, rather than trust someone else’s word for it. With some digging deeper and some critical thinking, you can avoid a lot of misinformation on the internet.

Advantages over Wikipedia: Wikipedia is a pretty reliable source of truth; in fact, it often cites scientific papers as its sources. However, Wikipedia is written to be concise, so that oftentimes, a 30-page research paper is summarized to 1-2 sentences. If you only read Wikipedia, you will miss a lot of the nuances contained in the original paper, and only develop a cursory understanding compared to going directly to the source.

Finding the right paper to read

If your professor or colleague has assigned you a specific paper to read, then you can skip this section.

A big part of the challenge of reading papers is deciding which ones to read. There are a lot of papers out there, and only a few will be relevant to you. Therefore, deciding what to read is a nontrivial skill in itself.

Research papers are the most useful when you have a specific problem or question in mind. When I first started out reading papers, I approached this the wrong way. One day, I’d suddenly decide “hmm, complexity theory is pretty interesting, let’s go on arXiv and look at some recent complexity theory papers“. Then, I’d open a few, attempt to read them, get confused, and conclude I’m not smart enough to read complexity theory papers. Why is this a bad idea? A research paper exists to answer a very specific question, so it makes no sense to pick up a random paper without the background context. What is the problem? What approaches have been tried in the past, and how have they failed? Without understanding background information like this, it’s impossible to appreciate the contribution of a specific paper.

2.pngAbove: Use the forward citation and related article buttons on Google Scholar to explore relevant papers.

It’s helpful to think of each research paper as a node in a massive, interconnected graph. Rather than each paper existing as a standalone item, a paper is deeply connected to the research that came before and after it.

Google Scholar is your best friend for exploring this graph. Begin by entering a few keywords and picking a few promising hits from the first 2-3 pages. Good, this is your starting point. Here are some heuristics for traversing the paper graph:

  • To go forward in time, look at works that cited this paper. A paper being cited usually means one of two things: (1) the future paper uses some technique or result developed in the current paper for some other purpose, or (2) the future paper improves on the techniques in the current paper. Citations of the second type are more useful.
  • To go backward in time, look at the paper’s introduction and related work. This puts the paper in context of previous work. Occasionally, you find a survey paper that doesn’t contribute anything novel of its own, but summarizes a bunch of previous related work; these are really helpful when you’re beginning your research in a topic.
  • Citation count is a good indicator of a paper’s importance and merit. If the paper has under 10 citations, take its claims with a grain of salt (even more so if it’s an arXiv preprint and not a peer-reviewed paper). Over 100 citations means the paper has made a significant contribution; over 1000 citations indicates a landmark paper in the field and is probably worth reading. Citation count is not a perfect metric, especially for very recent work, but it’s a useful heuristic that’s applicable across disciplines.

The first pass: High level overview

Great, you’ve decided on a paper to read. Now how to read it effectively?

Reading a paper is not like reading a novel. When you read a novel, you start at the beginning and read linearly until you reach the end. However, reading a paper is most efficient by hopping around the sections as appropriate, rather than read linearly from beginning to end.

The goal of your first reading of a paper is to first get a high level overview of the paper, before diving into the details. As you go through the paper, here are some good questions that you should be asking yourself:

  • What is the problem being solved?
  • What approaches have been tried before, and what are their limitations?
  • What is this paper’s novel contribution?
  • What experiments were done, using what dataset? How successful were the results?
  • Can the method in this paper be applied to my problem?
  • If not, what assumptions are needed for this method to work?

3.pngAbove: Treat each paper as a node in a massive graph of research, rather than a standalone item in a vacuum.

When I read a paper, I usually proceed in the following order:

  1. Abstract: a long paragraph that summarizes the entire paper. Read this to decide if the rest of the paper is worth reading or not.
  2. Introduction, diagrams, tables, and conclusion. Often, reading the diagrams and captions gives you a good idea of what’s going on with minimal effort.
  3. If the field is unfamiliar to you, then note down any interesting references in the introduction and related works sections to explore later. If the field is familiar, then just skim these sections.
  4. Read the main body of the paper: model, experiment, and discussion, without getting too bogged down in the details. If a section is confusing, skip it for now and come back to it on a second reading.

That’s it — you’ve finished reading a paper! Now you can either go back and read it again, focusing on the details you skimmed over the first pass, or move on to a different paper that you’ve added to your backlog.

When reading a paper, you should not expect to understand every aspect of the paper by the time you’re done. You can always refer back to the paper at a later time, as needed. Generally, you don’t need to understand all the details, unless you’re trying to replicate or extend the paper.

Help, I’m stuck!

Sometimes, despite your best efforts, you find that a paper is impenetrable. It’s not necessarily your fault — some papers are hastily written hours before a conference deadline. What do you do now?

Look for a video or blog post explaining the paper. If you’re lucky, someone may have recorded a lecture where the author presents the paper at a conference. Maybe somebody wrote a blog post summarizing the paper (Colah’s blog has great summaries of machine learning research). These are often better at explaining things than the actual paper.

If there’s a lot of background terminology that don’t make sense, it may be better to consult other sources like textbooks and course lectures rather than papers. This is especially true if the research is not new (>10 years old). Research papers are not always the best at explaining a concept clearly: by their nature, they document research as it’s being done. Sometimes, the paper paints an incomplete picture of something that’s better understood later. Textbook writers can look back on research after it’s already done, and thereby benefit from hindsight knowledge that didn’t exist when the paper was written.

Basic statistics is useful in many experimental fields — concepts like linear / logistic regression, p-values, hypothesis testing, and common statistical distribution. Any paper that deals with experimental data will use at least some statistics, so it’s worthwhile to be comfortable with basic stats.


That’s it for my advice. The densely packed two-column pages of text may appear daunting to the uninitiated reader, but they can be conquered with a bit of practice. Whether it’s for work or for fun, you definitely don’t need a PhD to read papers.

Publishing Negative Results in Machine Learning is like Proving Dragons don’t Exist

I’ve been reading a lot of machine learning papers lately, and one thing I’ve noticed is that the vast majority of papers report positive results — “we used method X on problem Y, and beat the state-of-the-art results”. Very rarely do you see a paper that reports that something doesn’t work.

The result is publication bias — if we only publish the results of experiments that succeed, even statistically significant results could be due to random chance, rather than anything actually significant happening. Many areas of science are facing a replication crisis, where published research cannot be replicated.

There is some community discussion of encouraging more negative paper submissions, but as of now, negative results are rarely publishable. If you attempt an experiment but don’t get the results you expected, your best hope is to try a bunch of variations of the experiment until you get some positive result (perhaps on a special case of the problem), after which you pretend the failed experiments never happened. With few exceptions, any positive result is better than a negative result, like “we tried method X on problem Y, and it didn’t work”.

Why publication bias is not so bad

I just described a cynical view of academia, but actually, there’s a good reason why the community prefers positive results. Negative results are simply not very useful, and contribute very little to human knowledge.

Now why is that? When a new paper beats the state-of-the-art results on a popular benchmark, that’s definite proof that the method works. The converse is not true. If your model fails to produce good results, it could be due to a number of reasons:

  • Your dataset is too small / too noisy
  • You’re using the wrong batch size / activation function / regularization
  • You’re using the wrong loss function / wrong optimizer
  • Your model is overfitting
  • You have a bug in your code

lattice2.pngAbove: Only when everything is correct will you get positive results; many things can cause a model to fail. (Source)

So if you try method X on problem Y and it doesn’t work, you gain very little information. In particular, you haven’t proved that method X cannot work. Sure, you found that your specific setup didn’t work, but have you tried making modification Z? Negative results in machine learning are rare because you can’t possibly anticipate all possible variations of your method and convince people that all of them won’t work.

Searching for dragons

Suppose we’re scientists attending the International Conference of Flying Creatures (ICFC). Somebody mentioned it would be nice if we had dragons. Dragons are useful. You could do all sorts of cool stuff with a dragon, like ride it into battle.

1.jpg

“But wait!” you exclaim: “Dragons don’t exist!”

I glance at you questioningly: “How come? We haven’t found one yet, but we’ll probably find one soon.”

Your intuition tells you dragons shouldn’t exist, but you can’t articulate a convincing argument why. So you go home, and you and your team of grad students labor for a few years and publish a series of papers:

  • “We looked for dragons in China and we didn’t find any”
  • “We looked for dragons in Europe and we didn’t find any”
  • “We looked for dragons in North America and we didn’t find any”

Eventually, the community is satisfied that dragons probably don’t exist, for if they did, someone would have found one by now. But a few scientists still harbor the possibility that there may be dragons lying around in a remote jungle somewhere. We just don’t know for sure.

This remains the state of things for a few years until a colleague publishes a breakthrough result:

  • “Here’s a calculation that shows that any dragon with a wing span longer than 5 meters will collapse under its own weight”

You read the paper, and indeed, the logic is impeccable. This settles the matter once and for all: dragons don’t exist (or at least the large, flying sort of dragons).

When negative results are actually publishable

The research community dislikes negative results because they don’t prove a whole lot — you can have a lot of negative results and still not be sure that the task is impossible. In order for a negative result to be valuable, it needs to present a convincing argument why the task is impossible, and not just a list of experiments that you tried that failed.

This is difficult, but it can be done. Let me give an example from computational linguistics. Recurrent neural networks (RNNs) can, in theory, compute any function defined over a sequence. In practice, however, they had difficulty remembering long-term dependencies. Attempts to train RNNs using gradient descent ran into numerical difficulties known as the vanishing / exploding gradient problem.

Then, Bengio et al. (1994) formulated a mathematical model of an RNN as an iteratively applied function. Using ideas from dynamical systems theory, they showed that as the input sequence gets longer and longer, the result is more and more sensitive to noise. The details are technical, but the gist of it is that under some reasonable assumptions, training RNNs using gradient descent is impossible. This is a rare example of a negative result in machine learning — it’s an excellent paper and I’d recommend reading it.

3.pngAbove: A Long Short Term Memory (LSTM) network handles long term dependencies by adding a memory cell (Source)

Soon after the vanishing gradient problem was understood, researchers invented the LSTM (Hochreiter and Schmidhuber, 1997). Since training RNNs with gradient descent was hopeless, they added a ‘latching’ mechanism that allows state to persist through many iterations, thus avoiding the vanishing gradient problem. Unlike plain RNNs, LSTMs can handle long term dependencies and can be trained with gradient descent; they are among the most ubiquitous deep learning architectures in NLP today.


After reading the breakthrough dragon paper, you pace around your office, thinking. Large, flying dragons can’t exist after all, as they would collapse under their own weight — but what about smaller, non-flying dragons? Maybe we’ve been looking for the wrong type of dragons all along? Armed with new knowledge, you embark on a new search…

4.jpgAbove: Komodo Dragon, Indonesia

…and sure enough, you find one 🙂

Simple models in Kaggle competitions

This week I participated in the Porto Seguro Kaggle competition. Basically, you’re asked to predict a binary variable — whether or not an insurance claim will be filed — based on a bunch of numerical and categorical variables.

With over 5000 teams entering the competition, it was the largest Kaggle competition ever. I guess this is because it’s a fairly well-understood problem (binary classification) with a reasonably sized dataset, making it accessible to beginning data scientists.

Kaggle is a machine learning competition platform filled with thousands of smart data scientists, machine learning experts, and statistics PhDs, and I am not one of them. Still, I was curious to see how my relatively simple tools would fare against the sophisticated techniques on the leaderboard.


The first thing I tried was logistic regression. All you had to do was load the data into memory, invoke the glm() function in R, and output the predictions. Initially my logistic regression wasn’t working properly and I got a negative score. It took a day or so to figure out how to do logistic regression properly, which got me a score of 0.259 on the public leaderboard.

Next, I tried gradient boosted decision trees, which I had learned about in a stats class but never actually used before. In R, this is simple — I just needed to change the glm() call to gbm() and fit the model again. This improved my score to 0.265. It was near the end of the competition so I stopped here.

At this point, the top submission had a score of 0.291, and 0.288 was enough to get a gold medal. Yet despite being within 10% of the top submission in overall accuracy, I was still in the bottom half of the leaderboard, ranking in the 30th percentile.

The public leaderboard looked like this:

Rplot.pngAbove: Public leaderboard of the Porto Seguro Kaggle competition two days before the deadline. Line in green is my submission, scoring 0.265.

This graph illustrates the nature of this competition. At first, progress is easy, and pretty much anyone who submitted anything that was not “predict all zeros” got over 0.200. From there, you make steady, incremental progress until about 0.280 or so, but afterwards, any further improvements is limited.

The top of the leaderboard is very crowded, with over 1000 teams having the score of 0.287. Many teams used ensembles of XGBoost and LightGBM models with elaborate feature engineering. In the final battle for the private leaderboard, score differences of less than 0.001 translated to hundreds of places on the leaderboard and spelled the difference between victory and defeat.

591926572-christophe-lemaitre-of-france-usain-bolt-of-jamaica.jpg.CROP.promo-xlarge2.jpgAbove: To run 90% as fast as Usain Bolt, you need to run 100 meters in 10.5 seconds. To get 90% of the winning score in Kaggle, you just need to call glm().

This pattern is common in Kaggle and machine learning — often, a simple model can do quite well, at least the same order of magnitude as a highly optimized solution. It’s quite remarkable that you can get a decent solution with a day or two of work, and then, 5000 smart people working for 2 months can only improve it by 10%. Perhaps this is obvious to someone doing machine learning long enough, but we should look back and consider how rare this is. The same does not apply to most activities. You cannot play piano for two days and become 90% as good as a concert pianist. Likewise, you cannot train for two days and run 90% as fast as Usain Bolt.

Simple models won’t win you Kaggle competitions, but we shouldn’t understate their effectiveness. Not only are they quick to develop, but they are also easier to interpret, and can be trained in a few seconds rather than hours. It’s comforting to see how far you can get with simple solutions — the gap between the best and the rest isn’t so big after all.

Read further discussion of this post on the Kaggle forums!