Predictions for 2030

Now that it’s Jan 1, 2020, I’m going to make some predictions about what we will see in the next decade. By the year 2030:

  • Deep learning will be a standard tool and integrated into workflows of many professions, eg: code completion for programmers, note taking during meetings. Speech recognition will surpass human accuracy. Machine translation will still be inferior to human professionals.

  • Open-domain conversational dialogue (aka the Turing Test) will be on par with an average human, using a combination of deep learning and some new technique not available today. It will be regarded as more of a “trick” than strong AI; the bar for true AGI will be shifted higher.

  • Driverless cars will be in commercial use in a few limited scenarios. Most cars will have some autonomous features, but full autonomy still not widely deployed.

  • S&P 500 index (a measure of the US economy, currently at 3230) will double to between 6000-7000. Bitcoin will still exist but its price will fall under 1000 USD (currently ~7000 USD).

  • Real estate prices in Toronto will either have a sharp fall or flatten out; overall increase in 2020-2030 period will not exceed inflation.

  • All western nations will have implemented some kind of carbon tax as political pressure increases from young people; no serious politician will suggest removing carbon tax.

  • About half of my Waterloo cohort will be married, but majority will not have any kids, at the age of 35.

  • China will overtake USA as world’s biggest economy, but growth will slow down, and PPP per capita will still be well below USA.

Five books to understand controversial political issues

Climate change, housing crisis, China, and recycling. Here are four highly complex and controversial topics that come up again and again in the news, in election debates, and in discussions with friends and family.

Despite all the media coverage, it’s difficult to get a balanced view. Individual news articles tend to present a simplistic, one-sided stance on a complex problem. Furthermore, these issues are politically polarizing so it’s easy to find yourself in a filter bubble and only see one side of the story.

This year, I resolved to read books to get a well-rounded understanding of some of the world’s most pressing and controversial issues. A book can go into much greater depth than an article, and should ideally be well-researched and present both sides of an argument.

Issue 1: Climate Change

The Climate Casino by William Nordhaus

This book, by a Nobel Prize winning economist, talks about climate change from an economic perspective. Climate change typically evokes extreme responses: conservatives deny it altogether, while environmentalists warn of doomsday scenarios. The reality is somewhere in the middle: if we don’t do anything about climate change, average surface temperature is projected to rise by 3.5C by end of the century, and would cost 1-4% of world GDP. It’s definitely a serious concern, but probably won’t cause the collapse of civilization either.

There have been thousands of papers at the IPCC studying various aspects of climate change, but there are still large uncertainties in all the projections. There is a small chance that we might cross a “tipping point” like the melting of polar ice caps, where the system changes in a catastrophic and irreversible way after a certain temperature point. It’s poorly understood exactly what temperature triggers a tipping point, thus adding even more uncertainty to models. Every time we emit CO2, we are gambling in the climate casino.

The three approaches to dealing with climate change are mitigation (emit less CO2), adapting to the effects of climate change, and geoengineering. We can reduce emissions quite a lot with only modest cost, but costs go up exponentially as you try to cut more and more emissions. It’s crucial that all countries participate: climate targets become impossible if only half of countries participate.

Economists agree that carbon tax is a simple and effective way of reducing carbon emissions across the board, and is more effective than direct government regulation. A carbon tax sends a signal to the market and significantly discourages high-carbon technologies like coal, while only increasing gas and electricity costs by a modest amount.

Currently, climate change is a partisan issue: opinions on climate change is highly correlated with political views. The scientific evidence is insurmountable, and gradually the public opinion will change.

Issue 2: Rise of China

China is an emerging superpower, projected to overtake the US as the world’s biggest economy in the next decade. There’s been a lot of tension between the two countries recently, looking at the trade wars and Hong Kong protests. I picked up two books on this topic: one with a Chinese perspective, and another with a western perspective.

China Emerging by Wu Xiaobo

I got this book in Shenzhen, one of the few English books about China in the bookstore. It describes the history of China from 1978 to today. In 1978, China was very poor, having experienced famines and the cultural revolution under Mao Zedong’s rule. The early 1980s was a turning point for China, where Deng Xiaoping started to open up the country to foreign investment and capitalism. He started by setting up “Special Economic Zones” in coastal cities like Shenzhen and Xiamen where he experimented with capitalism, with great success.

The 1980s and 1990s saw a gradual shift from communism to capitalism, where the state relinquished control to entrepreneurs and investors. By the 2000s, China was a manufacturing giant and everything was “made in China”. During the mid 2000s, there was a boom in massive construction projects like high speed rail and hundreds of skyscrapers. Development at such speed and scale is historically unprecedented — the book describes it as “a big ship sailing towards the future”.

This book helped me understand the Chinese mindset, although it is quite one-sided and at times reads like state propaganda. Of course, being published in China, it leaves out sensitive topics like the Tiananmen massacre and internet censorship.

China’s Economy: What Everyone Needs to Know by Arthur R. Kroeber

This book describes all aspects of China’s economy, written by a western author, and presents a quite different view of the situation. Since the economic reforms of 1978, life has gotten tremendously better, with the population living in extreme poverty falling from 90% in 1978 to less than 1% now. However, the growth has been uneven, and there is a high level of inequality.

One example of inequality is the hukou system. Industrialization brought many people into the cities, and now about 2/3 of the population is urban, but there is a lot of inequality between migrant workers and those with urban hukou. Only people with hukou has access to social services and healthcare. However you can’t just give everyone hukou because the top-tier cities don’t have infrastructure to support so many migrants.

Another example of inequality is in real estate. Around 2003, the government sold urban property at very low rates, essentially a large transfer of wealth. At the same time, local governments forcefully bought rural land at below market rates, exacerbating the urban-rural inequality. Chinese people like to invest in real estate because they believe it will always go up (as it had for the last 20 years).

In the early stages of economic reform, the priority was to mobilize the country’s enormous workforce, so some inefficiency and corruption was tolerated. China focused on labor-intensive light industry, like clothing and consumer appliances, rather than capital-intensive heavy industry. Recently, as wages rise, cheap labor is less abundant, so you need to increase economic efficiency for further growth. However, China struggles to produce quality, more advanced technology like cars, aircraft, and electronics (lots of phones are made in China but only the final assembly stage), and mostly produces cheap items of medium quality.

China will be the world’s largest economy in the next decade, although this doesn’t mean much after accounting for population size. Despite its economy, it has limited political influence, and has no strong allied countries, even in East Asia. It also struggles to become a technological leader: most of its tech companies only have a domestic market, and don’t gain traction outside the country. It’s clear that China has vastly different values from Western countries: they don’t value elections and democracy, rather the government is good as long as it keeps the economy running; these ideologies will need to learn to coexist in the future.

Issue 3: Canadian Housing Bubble

When the Bubble Burst by Hilliard MacBeth

Real estate prices and rent has risen a lot in the last 10 years in many Canadian cities (most notably Toronto and Vancouver) so that housing is unaffordable for many people now. Hilliard MacBeth has the controversial opinion that Canada is about to hit a housing bubble, with corrections on the order of 40-50%. Many people blame immigration and foreign investment, but the rise in real estate prices is due to low interest rate and the willingness of banks to make large mortgage loans.

Many people assume that house prices will always go up. This has been the case for the last 20 years, but there are clear counterexamples, like USA in 2008 and Japan in the 1990s. The Case-Shiller index shows that over an 100-year period, real estate values in the USA have approximately matched inflation. We’re likely overfitting to the most recent data, where we develop heuristics for the last few decades of growth and extrapolate it indefinitely into the future.

A few years ago, I asked a question on stackexchange about why should we expect stocks to go up in the long term — there is strong reason to believe the annual growth of 5-7% will continue indefinitely. For real estate, there’s no good reason why it should increase over the long term, so it’s more like speculation than investment. Instead of an investment, it’s better to think of real estate as paying for a lot of future rent upfront, and for young people to take on debt to get a mortgage is risky, especially in this market.

The author is extremely pessimistic about the future of Canadian real estate, which I don’t think is justified. Nevertheless, it’s good to question some common assumptions about real estate investing. In particular, the tradeoff between renting and ownership depends on a lot of factors, and we shouldn’t jump to the conclusion that ownership is better.

Issue 4: Recycling

Junkyard Planet by Adam Minter

The recycling industry is in chaos as China announced this year that it is no longer importing recyclable waste from western countries. Some media investigations have found that recycling is just a show, and much of it ends up in landfills or being burnt. Wait, what is going on exactly?

This book is written by a journalist and son of a scrapyard owner. In western countries, we normally think of recycling as an environmentalist act, but in reality it’s more accurate to think of it as harvesting valuable materials out of what would otherwise be trash. Metals like copper, steel, and aluminum are harvested from all kinds of things like Christmas lights, cables, cars, etc. It’s a lot cheaper and takes less energy to harvest metal (copper is worth a few thousand dollars a ton) from used items than to mine it from ore, which would take 100 tons to produce one ton of metal.

There’s a large international trade for scrap metal. America is the biggest producer of scrap, and it gets sent to China because of the cheap labor and China has a lot of demand for metal to build its developing infrastructure. The trade then goes into secondary markets where all kinds of scrap are defined by quality and sold to the highest bidder — there is little concern for the environment. The free market with economic incentives is much more efficient at recycling than citizens with good intentions.

Metals are highly recyclable, whereas plastic is almost impossible to recycle profitably because its value per ton is so low compared to metals. For some time, plastic recycling was done in Wen’an, but it was only possible because there was no environmental regulations and the workers didn’t wear protective equipment when handling dangerous chemicals. The government shut it down after people started getting sick. It’s more costly to recycle while complying with regulations, which is why very little of it is done in the US; the scrap is simply exported to countries with weak oversight.

Recycling usually takes place in places with cheap labor. Often you have the choice between building a machine to do something, or hire humans to do it. It all comes down to price: in one case, a task of sorting different metals is done by hundreds of young women in China and an expensive machine in America. Labor is getting more expensive in China with its rising middle class, so this is rapidly changing.

In the developed world, we have a misguided idea of how recycling works, and often we think of recycling as a “free pass” that allows us to consume as much as we want, as long as we recycle. In reality, recycling is imperfect and it will be turned into a lower-grade product; in the “reduce, reuse, recycle” mantra, reducing consumption has by far the biggest impact, reusing is good, and recycling should be considered a distant third option.

Non-technical challenges of medical NLP research

Machine learning has recently made a lot of headlines in healthcare applications, like identifying tumors from images, or technology for personalized treatment. In this post, I describe my experiences as a healthcare ML researcher: the difficulties in doing research in this field, as well as reasons for optimism.

My research group focuses on applications of NLP to healthcare. For a year or two, I was involved in a number of projects in this area (specifically, detecting dementia through speech). From my own projects and from talking to others in my research group, I noticed that a few recurring difficulties frequently came up in healthcare NLP research — things that rarely occurred in other branches of ML. These are non-technical challenges that take up time and impede progress, and generally considered not very interesting to solve. I’ll give some examples of what I mean.

Collecting datasets is hard. Any time you want to do anything involving patient data, you have to undergo a lengthy ethics approval process. Even with something as innocent as an anonymous online questionnaire, there is a mandatory review by an ethics board before the experiment is allowed to proceed. As a result, most datasets in healthcare ML are small: a few dozen patient samples is common, and you’re lucky to have more than a hundred samples to work with. This is tiny compared to other areas of ML where you can easily find thousands of samples.

In my master’s research project, where I studied dementia detection from speech, the largest available corpus had about 300 patients, and other corpora had less than 100. This constrained the types of experiments that were possible. Prior work in this area used a lot of feature engineering approaches, because it was commonly believed that you needed at least a few thousand examples to do deep learning. With less data than that, deep learning would just learn to overfit.

Even after the data has been collected, it is difficult to share with others. This is again due to the conservative ethics processes required to share data. Data transfer agreements need to be reviewed and signed, and in some cases, data must remain physically on servers in a particular hospital. Researchers rarely open-source their code along with the paper, since there’s no point of doing so without giving access to the data; this makes it hard to reproduce any experimental results.

Medical data is messy. Data access issues aside, healthcare NLP has some of the messiest datasets in machine learning. Many datasets in ML are carefully constructed and annotated for the purpose of research, but this is not the case for medical data. Instead, data comes from real patients and hospitals, which are full of shorthand abbreviations of medical terms written by doctors, which mean different things depending on context. Unsurprisingly, many NLP techniques fail to work. Missing values and otherwise unreliable data are common, so a lot of not-so-glamorous data preprocessing is often needed.


I’ve so far painted a bleak picture of medical NLP, but I don’t want to give off such a negative image of my field. In the second part of this post, I give some counter-arguments to the above points as well as some of the positive aspects of research.

On difficulties in data access. There are good reasons for caution — patient data is sensitive and real people can be harmed if the data falls into the wrong hands. Even after removing personally identifiable information, there’s still a risk of a malicious actor deanonymizing the data and extracting information that’s not intended to be made public.

The situation is improving though. The community recognizes the need to share clinical data, to strike a balance between protecting patient privacy and allowing research. There have been efforts like the relatively open MIMIC critical care database to promote more collaborative research.

On small / messy datasets. With every challenge, there comes an opportunity. In fact, my own master’s research was driven by lack of data. I was trying to extend dementia detection to Chinese, but there wasn’t much data available. So I proposed a way to transfer knowledge from the much larger English dataset to Chinese, and got a conference paper and a master’s thesis from it. If it wasn’t for lack of data, then you could’ve just taken the existing algorithm and applied it to Chinese, which wouldn’t be as interesting.

Also, deep learning in NLP has recently gotten a lot better at learning from small datasets. Other research groups have had some success on the same dementia detection task using deep learning. With new papers every week on few-shot learning, one-shot learning, transfer learning, etc, small datasets may not be too much of a limitation.

Same applies to messy data, missing values, label leakage, etc. I’ll refer to this survey paper for the details, but the take-away is that these shouldn’t be thought of as barriers, but as opportunities to make a research contribution.

In summary, as a healthcare NLP researcher, you have to deal with difficulties that other machine learning researchers don’t have. However, you also have the unique opportunity to use your abilities to help sick and vulnerable people. For many people, this is an important consideration — if this is something you care deeply about, then maybe medical NLP research is right for you.

Thanks to Elaine Y. and Chloe P. for their comments on drafts of this post.

NAACL 2019, my first conference talk, and general impressions

Last week, I attended my first NLP conference, NAACL, which was held in Minneapolis. My paper was selected for a short talk of 12 minutes in length, plus 3 minutes for questions. I presented my research on dementia detection in Mandarin Chinese, which I did during my master’s.

Here’s a video of my talk:

Visiting Minneapolis

Going to conferences is a good way as a grad student to travel for free. Some of my friends balked at the idea of going to Minneapolis rather than somewhere more “interesting”. However, I had never been there before, and in the summer, Minneapolis was quite nice.

Minneapolis is very flat and good for biking — you can rent a bike for $2 per 30 minutes. I took the light rail to Minnehaha falls (above) and biked along the Mississippi river to the city center. The downside is that compared to Toronto, the food choices are quite limited. The majority of restaurants serve American food (burgers, sandwiches, pasta, etc).

Meeting people

It’s often said that most of the value of a conference happens in the hallways, not in the scheduled talks (which you can often find on YouTube for free). For me, this was a good opportunity to finally meet some of my previous collaborators in person. Previously, we had only communicated via Skype and email. I also ran into people whose names I recognize from reading their papers, but had never seen in person.

Despite all the advances in video conferencing technology, nothing beats face-to-face interaction over lunch. There’s a reason why businesses spend so much money to send employees abroad to conduct their meetings.

Talks and posters

The accepted papers were split roughly 50-50 into talks and poster presentations. I preferred the poster format, because you get to have a 1-on-1 discussion with the author about their work, and ask clarifying questions.

Talks were a mixed bag — some were great, but for many it was difficult to make sense of anything. The most common problem was that speakers tended to dive into complex technical details, and lost sense of the “big picture”. The better talks spent a good chunk of time covering the background and motivation, with lots of examples, before describing their own contribution.

It’s difficult to make a coherent talk in only 12 minutes. A research paper is inherently a very narrow and focused contribution, while the audience come from all areas of NLP, and have probably never seen your problem before. The organizers tried to group talks into related topics like “Speech” or “Multilingual NLP”, but even then, the subfields of NLP are so diverse that two random papers had very little in common.

Research trends in NLP

Academia has a notorious reputation for inventing impractically complex models to squeeze out a 0.2% improvement on a benchmark. This may be true in some areas of ML, but it certainly wasn’t the case here. There was a lot of variety in the problems people were solving. Many papers worked with new datasets, and even those using existing datasets often proposed new tasks that weren’t considered before.

A lot of papers used similar model architectures, like some sort of Bi-LSTM with attention, perhaps with a CRF on top. None of it is directly comparable to one another because everybody is solving a different problem. I guess it shows the flexibility of Bi-LSTMs to be so widely applicable. For me, the papers that did something different (like applying quantum physics to NLP) really stood out.

Interestingly, many papers did experiments with BERT, which was presented at this conference! Last October, the BERT paper bypassed the usual conventions and announced their results without peer review, so the NLP community knew about it for a long time, but only now it’s officially presented at a conference.

Why Time Management in Grad School is Difficult

Graduate students are often stressed and overworked; a recent Nature report states that grad students are six times more likely to suffer from depression than the general population. Although there are many factors contributing to this, I suspect that a lot of it has to do with poor time management.

In this post, I will describe why time management in grad school is particularly difficult, and some strategies that I’ve found helpful as a grad student.


As a grad student, I’ve found time management to be far more difficult than either during my undergraduate years as well as working in the industry. Here are a few reasons why:

  1. Loose supervision: as a grad student, you have a lot of freedom over how you spend your time. There are no set hours, and you can go a week or more without talking to your adviser. This can be both a blessing and a curse: some find the freedom liberating while others struggle to be productive. In contrast, in an industry job, you’re expected to report to daily standup, you get assigned tickets each sprint, so others essentially manage your time for you.
  2. Few deadlines: grad school is different from undergrad in that you have a handful of “big” deadlines a year (eg: conference submission dates, major project due dates), whereas in undergrad, the deadlines (eg: assignments, midterms) are smaller and more frequent.
  3. Sparse rewards: most of your experiments will fail. That’s the nature of research — if you know it’s going to work, then it’s no longer research. It’s hard to not get discouraged when you struggle for weeks without getting a positive result, and start procrastinating on a multitude of distractions.

Basically, poor time management leads to procrastination, stress, burnout, and generally having a bad time in grad school 😦


Some time management strategies that I’ve found to be useful:

  1. Track your time. When I first started doing this, I was surprised at how much time I spent doing random, half-productive stuff not really related to my goals. It’s up to you how to do this — I keep a bunch of Excel spreadsheets, but some people use software like Asana.
  2. Know your plan. My adviser suggested a hierarchical format with a long-term research agenda, medium-term goals (eg: submit a paper to ICML), and short-term tasks (eg: run X baseline on dataset Y). Then you know if you’re progressing towards your goals or merely doing stuff tangential to it.
  3. Focus on the process, not the reward. It’s tempting to celebrate when your paper gets accepted — but the flip side is you’re going to be depressed if it gets rejected. Your research will have have many failures: paper rejections and experiments that somehow don’t work. Instead, celebrate when you finish the first draft of your paper; reward yourself when you finish implementing an algorithm, even if it fails to beat the baseline.

Here, I plotted my productive time allocation in the last 6 months:

time_allocation.png

Most interestingly, only a quarter of my time is spent coding or running experiments, which seems to be much less than most grad students. I read a lot of papers to try to avoid reinventing things that others have already done.

On average, I spend about 6 hours a day doing productive work (including weekends) — a quite reasonable workload of about 40-45 hours a week. Contrary to some perceptions, grad students don’t have to be stressed and overworked to be successful; allowing time for leisure and social activities is crucial in the long run.

Books I’ve read in 2018

I read 28 books in 2018 (about one every 2 weeks). Recently, I’ve been getting into the habit of taking notes in the margins and writing down a summary of what I learned after finishing them.

This blog post is a more-or-less unedited dump of some of my notes on some of the books I read last year. They were originally notes for myself and weren’t meant to be published, so a lot of ideas aren’t very well fleshed out. Without further ado, let’s begin.


Understanding Thermodynamics by H. C. Van Ness

Understanding Thermodynamics (Dover Books on Physics)

Pretty short, 100 page book that gives an intuitive introduction to various topics in thermodynamics and statistical mechanics. It’s meant to be a supplementary text, not a main text, so some really important things were omitted, which was confusing to me, since I’ve never studied this topic before. Some ideas I learned:

  • Energy can’t really be defined since it’s not a physical property. Can only write it as a sum of a bunch of things, and note that within a closed system, it always stays the same (first law of thermodynamics).
  • A process is reversible if you can do it in reverse to get back the initial state. No physical process is perfectly reversible, but closer it is to reversible, the more efficient it is.
  • Heat engines convert a heat differential into work. Two types are the Otto cycle (used in cars) and the Carnot cycle. Surprisingly, heat engines cannot be perfectly efficient, even under ideal conditions; the Carnot limit puts an upper bound. A heat engine that perfectly converts heat into work violates the second law of thermodynamics.
  • Second law of thermodynamics says that entropy always increases; moreover, it increases for irreversible processes and remains the same for reversible processes. This is useful for determining when a “box of tricks” (taking in compressed air, outputting cold air at one end and hot air at the other end) is possible. The book doesn’t give much intuition about why the definition of entropy makes sense though, it literally tries random combinations of variables until one “works” (gives a constant value experimentally).
  • Second law of thermodynamics is merely an empirical observation, and can’t be proved. In fact, it can be challenged at the molecular level (eg: Maxwell’s demon) which isn’t easily refutable.
  • Statistical mechanics gives an alternate definition of entropy in terms of molecular states, and from it, you can derive various macroscopic properties like temperature and pressure. However, it only works well for ideal gases, and doesn’t quite explain or replace thermodynamics.

Indian Horse by Richard Wagamese

Indian Horse: A Novel

This book is about the life of an Ojibway Indian, living in northern Ontario and growing up in the 60s. When he was young, they sent him to a residential school where he was badly treated and not allowed to speak his own language. He found hockey and got really good at it, but faced problems with racism so he couldn’t really make it in the big leagues with white people. Later, he faced more racism in his job as a logger. Eventually, he developed an alcohol addiction due to this disillusionment and finally comes to terms with his life.

Very interesting perspective on the indigenous people of Canada, a group that most of us don’t think about often. Despite numerous government subsidies, they’re still some of the poorest people in the country, with low education levels. Some people think it’s laziness, but they’ve had a history of mistreatment in residential schools and were subjected to racism until very recently, so it’s difficult for them to integrate into society. Their reserves are often a long distance from major population centers, which means very few opportunities. Furthermore, their culture doesn’t really value education. Overall, great read about a group currently marginalized in Canadian society.

The Power of Habit by Charles Duhigg

The Power of Habit: Why We Do What We Do in Life and Business

Book that discusses various aspects of how habits work. On a high level, habits have three components: cue, routine, and reward. The cue is a set of conditions, such that you automatically perform a routine in order to get a reward. After a while, you will crave the reward when given the cue, and perform the routine automatically (even if the reward is intermittent).

To change a habit, you can’t just force yourself not to do it, because you will constantly crave the reward. Instead, replace the routine with something else that gives a similar reward but is less harmful. Forcing yourself to do something against habit depletes your willpower, so it’s much better to change the habit, so you do it automatically and retain your willpower.

Large changes are often precipitated by a small “keystone” habit change that catalyze a series of systemic changes. For example, Alcoa, an aluminum company, improved its overall efficiency when it decided to focus on safety. Sometimes a disaster is needed to bring about an systemic change in an organization, like a fire in King’s Cross station or operating on the wrong side of a patient in a hospital. Peer pressure is important, for example it’s a key component in Alcoholics Anonymous and making the black civil rights movement go through.

Overall, pretty interesting read, although I think there’s too much dramatic storytelling and anecdotes; I would’ve preferred more scientific discussion and a bit less storytelling.

Why We Sleep by Matthew Walker

Why We Sleep: Unlocking the Power of Sleep and Dreams

This book gives a comprehensive scientific overview of sleep. Although there are still many unanswered questions, there’s been a lot of research lately and this book sums it up.

Sleep is a very necessary function of life. Every living organism requires it, although in different amounts, and total lack of sleep very quickly leads to death. However it’s still unclear exactly why sleep is so important.

There are two types of sleep: REM (rapid eye movement) and NREM sleep. REM sleep is a much lighter form of sleep where you’re closer to the awake state, and is also when you dream; NREM is a much deeper sleep. You can distinguish the type of sleep easily by measuring brain waves.

Sleep deprivation is really bad. You don’t even need total deprivation, even six hours of sleep a day for a few nights is as bad as pulling an all-nighter. When you’re sleep deprived, you’re a lot worse at learning things, controlling your emotions, and you’re also more likely to get sick and more susceptible to cancer.

Dreams aren’t that well understood, but they seem to consolidate memories, including moving them from short term to long term storage. REM sleep especially lets your brain find connections between different ideas, and you’re better at problem solving immediately after.

Insomnia is a really common problem in our society, in part due to it being structured to encourage sleeping less. Sleeping pills are ineffective at best (prescription ones like Ambien and Benzodiazepines are actually really harmful), the recommended treatment is behavioral, like sleeping in a regular schedule, avoiding caffeine and nicotine and alcohol, don’t take naps, avoid light in the bedroom.

My parents always told me it’s bad to stay up so late, but science doesn’t really support this. Different people have different chronotypes, which are determined by genetics (and somewhat changes by age). It’s okay to sleep really late, as long as you maintain a consistent sleep schedule.

Overall I learned a lot from this book but it’s a fairly dense read, with lots of information about different topics, and it took me over a month to finish it.

Notes from the Underground by Fyodor Dostoyevsky

Notes From The Underground

I read this Dostoyevsky book because it had an interesting plot of a man who tries to rescue a prostitute. It turns out that the rescuing prostitute part is not really the central event of the book, but nevertheless I found it quite interesting. The novella is short enough (90 pages) unlike Dostoyevsky’s other books which are super long. It explores a lot of philosophical and psychological ideas in an interesting setting.

The unnamed narrator is a man from the “underground” — he is some kind of civil servant, middle aged, and has health problems. He rejects the idea that man must do the rational thing, as then he is like a machine. He rejoices in doing stupid things from time to time, just because he feels like it, then he can retain some of his humanity. In the second part of the book, the narrator feels like he is not seen as equal by his peers, and goes to extreme lengths to remedy it. He forcefully invites himself to a dinner party with old friends, and is dismayed that his social status is so low that he’s just ignored. He would much rather have a fight than be ignored, and tries to provoke a fight in an autistic manner. Later he meets a prostitute Liza, whom he offers to save. However, when she actually shows up at his place, he is stuck in his own world and lectures to her about the virtues of morality, without actually helping her.

The narrator feels surreal, kind of like valuing social acceptance to an extreme degree. After all, the narrator is physically well-off, he is at least rich enough to hire one servant. However, as long as he feels inferior to his peers, he is frustrated. Also, the more he tries to gain respect from his peers, the more his efforts backfire and his position is lowered in their eyes. Social recognition isn’t something you should pursue directly.

Factfulness by Hans Rosling

Factfulness: Ten Reasons We're Wrong About the World--and Why Things Are Better Than You Think

This book was written by Hans Rosling (the same guy that made The Joy of Stats documentary) just before he died in 2017. It uses stats to show that despite what the media portrays, and despite popular conception, the world is not such a bad place. Extreme poverty is on the decline, children are being vaccinated, women are going to school.

At the beginning of the book, he gives a quiz of 13 questions. Most people score terribly, worse than random chance, by consistently guessing that the world is worse than it actually is. Without looking at stats, it’s easy to be systematically mislead and fall into a bunch of falacies like not considering magnitude of effects, generalizing your experience to others, or acting based on fear. Maybe because of my stats background, a lot of what he says is quite obvious to me. Also I scored 9 on the quiz, which is higher than pretty much everyone. It confirmed some stuff that I already knew, but still it had good insights on poverty and developing nations.

A big takeaway for me is to be thankful of what we have, seeing the difference of lives in levels 1-3. Canada is a level 4 country (where people spend more than $32 dollars a day) yet people make fun of me for making 20k/year “poverty” grad school wages. Grad students in Canada should be thankful that we have electricity, running water, can eat out at restaurants, and not sad that we can’t afford luxury cars and condos.

Sky Burial by Xinran Xue

Sky Burial by Xinran (2005) Paperback

In this novel, a Chinese women, Shu Wen from Suzhou, travels to Tibet to search for her missing husband. This was in 1958, when the Chinese Communist Party annexed Tibet. On the way there, she picks up a Tibetan woman, Zhuoma. They get into some trouble in the mountains and meet a Tibetan family, and gradually Wen integrates into the Tibetan culture and learns the language and customs. Time passes by quickly and before you realize it, 30 years has passed while they have practically no information from the outside world. In the end, Wen does find out what happened to her husband through his diaries, but it’s a bittersweet sort of ending as her world is changed unrecognizably and her husband is dead.

The author makes it ambiguous whether this is a work of fiction or it actually happened — all the facts seem believable, other than somehow not finding out about the great famine and cultural revolution for decades. A lot of interesting Tibetan customs are explained: their nomadic lifestyle, polyamorous family structure, buddhist religious beliefs, and their practice of sky burial which lets vultures eat their dead. The relationship between the Chinese and Tibetan has always been a contentious one, and in this book they form a connection of understanding between the two ethnic groups.

Tibet seems like a really interesting place that I should visit someday. However, it’s unclear how much of their traditional culture is still accessible, due to the recent Han Chinese migrations. Also, it’s currently impossible to travel freely in Tibet without a tour group if you’re not a Chinese citizen.

Getting to YES by Fisher, Ury, and Patton

Getting to Yes: Negotiating Agreement Without Giving In

This book tells you how to negotiate more effectively. A common negotiating mistake is to use positional negotiation, which is each side picking an arbitrary position (eg: buy the car for $5000), and going back and forth until you’re tired and agree, or you both walk out. Positional negotiation is highly arbitrary, and often leads to no agreement, which is bad for both parties.

Some ways to negotiate in a more principled way:

  • Emphasize with the other party, get to know them and their values, treat it as both parties against a common problem rather than you trying to “win” the negotiation.
  • Focus on interests, rather than positions. During the negotiation, figure out what each party really wants; sometimes, it’s possible to give them something that’s valuable for them but you don’t really care about. Negotiation is a nonzero sum game, so try to find creative solutions that fulfill everybody’s interests, rather than fight over a one-dimensional figure.
  • When creative solutions are not possible (both sides just want money), defer to objective measures like industry standards. This gives you both an anchor to use, rather than negotiating in a vacuum.
  • Be aware of your and the other party’s BATNA: best alternative to negotiated agreement. This determines who holds more power in a negotiation, and improving it is a good way to get more leverage.

Trump: A Graphic Biography by Ted Rall

Trump: A Graphic Biography

A biography of Trump in graphical novel format. This book was written after Trump won the republican primaries (May 2016) but before he won the presidency (Nov 2016).

First, the book describes the political and economic circumstances that led to Trump coming into power. After the 2008 financial crisis, many low-skilled Americans felt like there was little economic opportunity for them. Many politicians had come and gone, promising change, but nothing happened. For them, Trump represented a change from the political establishment. They didn’t necessarily agree with all of his policies, they just wanted something radical.

Trump was born after WW2 to a wealthy family in New York City. He studied economics and managed a real estate empire for a few decades, which made him a billionaire. Through his deals in real estate, he proved himself a cunning and ruthless negotiator who is willing to behave unethically and use deception to get what he wanted.

This was a good read because most of my friend group just thinks Trump is “stupid”, and everyone who voted for him is stupid. I never really understood why he was so popular among the other demographic. As a biography, the graphic novel format is good because it’s much shorter; most other biographies go into way too much detail about a single person’s life than I care to know about.

12 Rules for Life by Jordan Peterson

Jordan Peterson’s new book that quickly hit #1 on the bestsellers lists after being released this year. He’s famous around UofT for speaking out against social justice warriors, but I later found out that he has a lot of YouTube videos on philosophy of how to live your life. This book summarizes a lot of these ideas into a single book form, in the form of 12 “rules” to live by, in order to live a good and meaningful life.

These ideas are the most interesting and novel to me:

  • Dominance hierarchy: humans (especially men) instinctively place each other on a hierarchy, where the person at the top has all the power and status, and gets all the resources. Women want to date guys near the top of the hierarchy, and men near the top get many women easily while men at the bottom can’t even find one. Therefore, it’s essential to rise to the top of the dominance hierarchy.
  • Order and chaos: order is the part of the world that we understand, that behaves according to rules; chaos is the unknown, risk, failure. To live a meaningful life is to straddle the boundary between order and chaos, and have a little bit of both.
  • When raising children, it’s the parents’ responsibility to educate them how to behave properly to follow social norms, because otherwise, society will treat them harshly and this will snowball into social isolation later in life. Also, they should be encouraged to do risky things (within reason) to explore / develop their masculinity.

Some of the other rules are more obvious. Examples include: be truthful to yourself, choose your friends wisely, improve yourself incrementally rather than comparing yourself to others, confront issues quickly as they arise. I guess depending on your personality and prior experience, you might find a different subset of these rules to be obvious.

Initially, I found JP to be obnoxious because of the lack of scientific rigour in his arguments, he just seems convincing because he’s well-spoken. The book does a slightly better job than the videos in substantiating the arguments and citing various psychology research papers. JP also has a tendency to cite literature; when he goes into stuff like bible archetypes of Christ, or Cain/Abel, then I have no idea what he’s talking about anymore. The book felt a bit long. Overall still a good read, I learned a lot from this book and also by diving deeper into the psychology papers he cited.

Analects by Confucius

The Analects of Confucius: A Philosophical Translation (Classics of Ancient China)

The Analects (论语) is a book of philosophy by Confucius and lays down the groundwork for much of Chinese thinking for the next 2500 years. It’s the second book I’ve read in ancient Chinese literature after the Art of War. It’s written in a somewhat different style — it has 20 chapters of varying lengths, but the chapters aren’t really organized by topic and the writing jumps around a lot.

Confucius tells you how to live your life not by appeal to religion, but rather by showing characteristics that he considers “good”, and gives examples of what is and what isn’t considered good. A few reoccuring ideas:

  • junzi 君子 – exemplary person. The ideal, wise person that we should strive to be. A junzi strives to be excellent (德) and honorable (信), and not be arrogant or greedy or materialistic. He seeks knowledge, respects elders, is not afraid to speak up, and conducts himself authoratatively.

  • li 礼- ritual propriety. The idea that there are certain “rituals” that society observes, and that if a leader respects them, then things will go smoothly. Kind of like the “meta” in games — modern examples would be the employer/employee relationship, or what situations do you perform a handshake with someone.

  • xiao 孝 – filial responsibility. A son must respect his parents and take care of them in old age, and mourn for them for three years after their death (since for three years after birth, a child is helpless unless for his parents).

  • haoxue 好学 – love of learning for the sake of learning

  • ren 仁 – authorative conduct / benevolence / humanity. Basically a leader should conduct himself in a responsible manner, be fair yet firm.

  • dao 道 – the way. One should forge one’s path through life.

An obvious question is why should we listen to Confucius if there’s no appeal either to a higher power (like the bible) or by axiomizing everything. I don’t really know, but many Chinese have studied this book and lived their lives according to its principles, so by studying it, we can better understand how Chinese think.

I feel like the Analects tells us how an ideal Chinese is “supposed” to think, but modern Chinese people are very much the opposite. Modern Chinese people are generally very materialistic, competitive, and care about comparing themselves to people around them. A friend said much of what is written here is “obvious” to any Chinese person — but then why don’t they actually follow it? I guess modern Chinese society is very unequal, and one must be competitive to rise to the top to prosper. So the cynical answer is that recent economic forces override thousand-year philosophy, which is the ideal, but falls apart when push comes to shove.

The Analects is a very thought-provoking book. It’s surprising how many things Confucius said 2500 years ago is still true today. I probably missed a lot of things in my first pass through it — but this is a good starting point for further reading on Chinese philosophy and literature.

Pachinko by Min Jin Lee

Pachinko (National Book Award Finalist)

Pachinko is the name of the Japanese pinball game, where you watch metal balls tumble through a machine. It’s also the name of this novel, that traces a Korean family in Japan through four generations (Yangjin/Hoonie/Hansu -> Sunja/Isak -> Noa/Mozasu -> Solomon/Phoebe). Sunja is the first generation to immigrate to Japan during the 1930s, after being tricked by a rich guy who got her pregnant. Afterwards, they make their livelihoods in Japan, but they are always considered outsiders, despite being in the country for many generations.

It’s surprising to see so much racism in Japan towards Koreans, since Canada is so multicultural and so accepting of people from other places. Japan is very different: even after four generations in Japan, a Korean boy is still considered a guest and must register with the government every few years or risk getting deported. The Koreans in Japan can’t work the same jobs as the Japanese, can’t legally rent property, and get bullied at school, so they end up working in pachinko parlors, which the Japanese consider “dirty”. All the Korean men: Mozasu, Noa, and Solomon end up working in pachinko, hence the name of the book.

One thing that struck me was how so many of the characters valued idealism more than rationality. Yoseb doesn’t want his wife to go out to work because he considers it improper. Sunja and Noa don’t want to accept Hansu’s help because of shame, even though they could have benefitted a lot, materially. All the Christians have this sort of idealist irrationality, which I guess is part of being religious — only Hansu behaves in a way that makes sense to me. This book gets a bit slow in the end as there are too many minor characters, but is overall a thought provoking read about racism in Japanese society.

Visual Intelligence by Amy Herman

Visual Intelligence: Sharpen Your Perception, Change Your Life

This book uses art to teach you to notice your surroundings more, which is very interesting. The basic premise is there’s a lot of things that we miss, but can be quite important. The two biggest ideas in this book for me:

  1. Train yourself to be more visually perceptive by looking at art, and trying to notice every detail. This seems trivial but often we miss things. Now in the real world, do the same thing and see things in a different way.

  2. Our experiences shape how we perceive things, so it’s important to describe things objectively rather than subjectively. Do not make assumptions, rather, describe only the facts of what you see. From a picture you can’t infer a person is “homeless”, but rather that he’s “lying on a street next to a shopping cart”.

Memoirs of a Geisha by Arthur Golden

Memoirs of a Geisha (Vintage Contemporaries)

This novel tells the story of the geisha Sayuri, from her childhood until her death. It pretends to be a real memoir, but it’s written by an American man. The facts are thoroughly researched, so we get a feel of what Kyoto was like before the war.

Essentially, society in Japan was very unequal — the women have to go through elaborate rituals and endure a lot of suffering to please the men, who just have a lot of money. However, even without formal power, the geishas like Mameha and Hatsumomo construct elaborate schemes of deceit and trickery.

The plot was exciting to read, but certain characters felt flat. Sayuri’s infatuation for the chairman for decades doesn’t seem believable — maybe I would’ve had a crush like that as a teenager, but certainly a woman in her late 20s should know better. Hatsumomo’s degree of evilness didn’t seem convincing either.

Lastly, having read some novels by actual Japanese authors, this book feels nothing like them. Japanese literature is a lot more mellow, and the characters more reserved: certainly nobody would act in such an obviously evil manner. Japanese novels also typically have themes of loneliness and isolation and end with people committing suicide, which doesn’t happen in this novel either.

 

Deep Learning for NLP: SpaCy vs PyTorch vs AllenNLP

Deep neural networks have become really popular nowadays, producing state-of-the-art results in many areas of NLP, like sentiment analysis, text summarization, question answering, and more. In this blog post, we compare three popular NLP deep learning frameworks: SpaCy, PyTorch, and AllenNLP: what are their advantages, disadvantages, and use cases.

SpaCy

Pros: easy to use, very fast, ready for production

Cons: not customizable, internals are opaque

spacy_logo.jpg

SpaCy is a mature and batteries-included framework that comes with prebuilt models for common NLP tasks like classification, named entity recognition, and part-of-speech tagging. It’s very easy to train a model with your data: all the gritty details like tokenization and word embeddings are handled for you. SpaCy is written in Cython which makes it faster than a pure Python implementation, so it’s ideal for production.

The design philosophy is the user should only worry about the task at hand, and not the underlying details. If a newer and more accurate model comes along, SpaCy can update itself to use the improved model, and the user doesn’t need to change anything. This is good for getting a model up and running quickly, but leaves little room for a NLP practitioner to customize the model if the task doesn’t exactly match one of SpaCy’s prebuilt models. For example, you can’t build a classifier that takes both text, numerical, and image data at the same time to produce a classification.

PyTorch

Pros: very customizable, widely used in deep learning research

Cons: fewer NLP abstractions, not optimized for speed

pytorch_logo.jpeg

PyTorch is a deep learning framework by Facebook, popular among researchers for all kinds of DL models, like image classifiers or deep reinforcement learning or GANs. It uses a clear and flexible design where the model architecture is defined with straightforward Python code (rather than TensorFlow’s computational graph design).

NLP-specific functionality, like tokenization and managing word embeddings, are available in torchtext. However, PyTorch is a general purpose deep learning framework and has relatively few NLP abstractions compared to SpaCy and AllenNLP, which are designed for NLP.

AllenNLP

Pros: excellent NLP functionality, designed for quick prototyping

Cons: not yet mature, not optimized for speed

allennlp_logo.jpg

AllenNLP is built on top of PyTorch, designed for rapid prototyping NLP models for research purposes. It supports a lot of NLP functionality out-of-the-box, like text preprocessing and character embeddings, and abstracts away the training loop (whereas in PyTorch you have to write the training loop yourself). Currently, AllenNLP is not yet at a 1.0 stable release, but looks very promising.

Unlike PyTorch, AllenNLP’s design decouples what a model “does” from the architectural details of “how” it’s done. For example, a Seq2VecEncoder is any component that takes a sequence of vectors and outputs a single vector. You can use GloVe embeddings and average them, or you can use an LSTM, or you can put in a CNN. All of these are Seq2VecEncoders so you can swap them out without affecting the model logic.

The talk “Writing code for NLP Research” presented at EMNLP 2018 gives a good overview of AllenNLP’s design philosophy and its differences from PyTorch.

Which is the best framework?

It depends on how much you care about flexibility, ease of use, and performance.

  • If your task is fairly standard, then SpaCy is the easiest to get up and running. You can train a model using a small amount of code, you don’t have to think about whether to use a CNN or RNN, and the API is clearly documented. It’s also well optimized to deploy to production.
  • AllenNLP is the best for research prototyping. It supports all the bells and whistles that you’d include in your next research paper, and encourages you to follow the best practices by design. Its functionality is a superset of PyTorch’s, so I’d recommend AllenNLP over PyTorch for all NLP applications.

There’s a few runner-ups that I will mention briefly:

  • NLTK / Stanford CoreNLP / Gensim are popular libraries for NLP. They’re good libraries, but they don’t do deep learning, so they can’t be directly compared here.
  • Tensorflow / Keras are also popular for research, especially for Google projects. Tensorflow is the only framework supported by Google’s TPUs, and it also has better multi-GPU support than PyTorch. However, multi-GPU setups are relatively uncommon in NLP, and furthermore, its computational graph model is harder to debug than PyTorch’s model, so I don’t recommend it for NLP.
  • PyText is a new framework by Facebook, also built on top of PyTorch. It defines a network using pre-built modules (similar to Keras) and supports exporting models to Caffe to be faster in production. However, it’s very new (only released earlier this month) and I haven’t worked with it myself to form an opinion about it yet.

That’s all, let me know if there’s any that I’ve missed!