In this edition of my book review blog post series, I will summarize three books I recently read about artificial intelligence. AI is a hot topic nowadays, and people inside and outside the field have very different perspectives. Even as an AI researcher, I found a lot to learn from these books.
Emergence of the statistics discipline
Machine learning and statistics are closely related areas: ML can be viewed as statistics but with computers. Thus, to understand machine learning, it’s natural to start from the beginning and study the history of statistics.
The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century by David Salsburg
This book tells the story of how statistics emerged as a scientific discipline in the 20th century. The title of the book comes from a story where Fisher wanted to see if a lady can tell when tea or milk was added to the cup first; he comes up with a series of randomized experiments that motivate modern hypothesis testing. The book describes the lives and circumstances of the people involved, explaining the math pretty well using words, without getting too technical with equations. Some of the founding fathers of statistics:

Karl Pearson (18571936) was the founder of mathematical statistics, devised methods of estimating statistical parameters from data, founded the journal Biometrika, and applied these methods to confirm Darwin’s theory of natural selection. He had a dominating personality, and his son Egon Pearson also became a famous statistician.

William Gosset (18761937) discovered the Student tdistribution while working for Guinness, improving methods to brew beer. He had to publish under the pseudonym “Student” because Guinness wouldn’t let their employees publish.

Ronald Fisher (18901962) was a genius that invented a lot of modern statistics including MLE for estimating parameters, ANOVA, and experimental design. He originally used these methods to study the effects of fertilizers on crop variation, and eventually became a distinguished professor. Fisher did not get along with Pearson, and also dismissed evidence that smoking caused cancer long after it was accepted by the scientific community.

Jerzy Neyman (18941981) invented the standard textbook formulation of hypothesis testing against a null hypothesis, and introduced the concept of confidence interval. Fisher and many others were skeptical since it’s unclear what is the interpretation of pvalue and 95% probability of the 95% confidence interval.
Statistics is now a crucial part in experiments across many scientific disciplines. Undoubtedly, statistics changed the way we do science, and this book tells the story of how it happened. I liked the first part of the book more, since it talks about the most influential figures in early statistics. By the latter half of the book, statistics had already diversified into numerous subdisciplines, and the book jumps rapidly between a plethora of scientists.
Causal reasoning: a limitation of ML
Book by Judea Pearl, one of the leaders of causal inference who received the 2011 Turing award for his work on Bayesian networks. Pearl points out a flaw that affects all machine learning models, from the simplest linear regression to the deepest neural networks — it’s impossible to tell the difference between causation and correlation using data alone. Every morning I hear the birds chirp before sunrise, so do the birds cause the sun to rise? Obviously not, but for a machine, this is surprisingly difficult to deduce.
The Book of Why: The New Science of Cause and Effect by Judea Pearl
Pearl gives three levels of causation, where each level can’t be built up from tools of the lower levels:

Level 1 — Association: this is where most machine learning and statistics methods stand today. They can find correlations but can’t differentiate them from causation.

Level 2 — Intervention: using causal diagrams and donotation, you can tell whether X causes Y. The first step is to use this machinery to determine if a causal relation is possible from the data, then apply level 1 methods to compute the strength of the causality.

Level 3 — Counterfactuals: given that you did X and Y happened, determine what would have happened if you did X’ instead.
The most reliable way to determine causality is through a randomized trial, but often this is impractical due to cost or ethics, and we only have observational data. A lot of scientists just control for as many variables as possible, but there are situations where this strategy is flawed. Using causal diagrams, the book explains more sophisticated techniques to determine causality, and a quick algorithm to decide if a variable should be controlled or not.
Causal inference is an active area of machine learning research, although an area that’s often ignored by mainstream ML. Judea Pearl thinks that figuring out a better representation of causation is a key missing ingredient for strong AI.
AI in the far future
When will we have superhuman artificial general intelligence (AGI)? Well, it depends on who you ask. The media often portrays AGI as on the verge of being achieved in just a few years, but AI researchers predict it to be out of reach for several decades or even centuries.
Superintelligence: Paths, Dangers, Strategies by Nick Bostrom
Bostrom believes strong AGI has a serious possibility of being achieved in the near future, say, by 2050. And once this happens, AI is an existential threat to humanity. Once AI has initially exceeded human ability, it will rapidly improve itself or use its programming skills to develop even stronger AI, and humans will be left in the dust. It will be very difficult to keep a superintelligent AI boxed, since it can develop advanced technology and there are many ways that it can escape from its sandbox.
Depending on how the AI is programmed, it may have very different values from humans. In many of Bostrom’s hypothetical scenarios, an AI designed for some narrow task (eg: producing paperclips) decides to take over the world and unleash an army of selfreplicating nanobots to turn every atom in the universe into paperclips. There are a lot of unsolved questions of how to design an agent that can only singlemindedly maximize an objective function, without risk of it doing catastrophic unintended actions to maximize its objective.
For now, there is no imminent possibility of AGI, so it’s unclear to what extent specifying the value function will actually be a problem. There are much more immediate dangers of AI technology, for example, unfair bias against certain groups, and economic consequences of automation taking over jobs. Andrew Ng famously said: “fearing a rise of killer robots is like worrying about overpopulation on Mars“. Nevertheless, Bostrom makes a valid point: the risks of superhuman AI to humanity is so great that it’s worth taking seriously and investing in further research.