# Real World Applications of Automaton Theory

This term, I was teaching a course on intro to theory of computation, and one of the topics was finite automatons — DFAs, NFAs, and the like. I began by writing down the definition of a DFA:

A deterministic finite automaton (DFA) is a 5-tuple $(Q, \Sigma, \delta, q_0, F)$ where: $Q$ is a finite set of states…

I could practically feel my students falling asleep in their seats. Inevitably, a student asked the one question you should never ask a theorist:

“So… how is this useful in real life?”

## DFAs as a model of computation

I’ve done some theoretical research on formal language theory and DFAs, so my immediate response was why DFAs are important to theorists.

Above: A DFA requires O(1) memory, regardless of the length of the input.

You might have heard of Turing machines, which abstracts the idea of a “computer”. In a similar vein, regular languages describe what is possible to do with a computer with very little memory. No matter how long the input is, a DFA only keeps track of what state it’s currently in, so it only requires a constant amount of memory.

By studying properties of regular languages, we gain a better understanding of what is and what isn’t possible with computers with very little memory.

This explains why theorists care about regular languages — but what are some real world applications?

## DFAs and regular expressions

Regular expressions are a useful tool that every programmer should know. If you wanted to check if a string is a valid email address, you might write something like:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/ Behind the scenes, this regular expression gets converted into an NFA, which can be quickly evaluated to produce an answer. You don’t need to understand the internals of this in order to use regular expressions, but it’s useful to know some theory so you understand its limitations. Some programmers may try to use regular expressions to parse HTML, but if you’ve seen the Pumping Lemma, you will understand why this is fundamentally impossible. ## DFAs in compilers In every programming language, the first step in the compiler or interpreter is the lexer. The lexer reads in a file of your favorite programming language, and produces a sequence of tokens. For example, if you have this line in C++: cout << "Hello World" << endl;  The lexer generates something like this: IDENTIFIER cout LSHIFT << STRING "Hello World" LSHIFT << IDENTIFIER endl SEMICOLON ;  The lexer uses a DFA to go through the source file, one character at a time, and emit tokens. If you ever design your own programming language, this will be one of the first things you will write. Above: Lexer description for JSON numbers, like -3.05 ## DFAs for artificial intelligence Another application of finite automata is programming simple agents to respond to inputs and produce actions in some way. You can write a full program, but a DFA is often enough to do the job. DFAs are also easier to reason about and easier to implement. The AI for Pac-Man uses a four-state automaton: Typically this type of automaton is called a Finite State Machine (FSM) rather than a DFA. The difference is that in a FSM, we do an action depending on the state, whereas in a DFA, we care about accepting or rejecting a string — but they’re the same concept. ## DFAs in probability What if we took a DFA, but instead of fixed transition rules, the transitions were probabilistic? This is called a Markov Chain! Above: 3 state Markov chain to model the weather Markov chains are frequently used in probability and statistics, and have lots of applications in finance and computer science. Google’s PageRank algorithm uses a giant Markov chain to determine the relative importance of web pages! You can calculate things like the probability of being in a state after a certain number of time steps, or the expected number of steps to reach a certain state. In summary, DFAs are powerful and flexible tools with myriad real-world applications. Research in formal language theory is valuable, as it helps us better understand DFAs and what they can do. # My First Research Paper: State Complexity of Overlap Assembly My first research paper is completed and has been uploaded to Arxiv! It’s titled “State Complexity of Overlap Assembly”, by Janusz Brzozowski, Lila Kari, Myself, and Marek Szykuła. It’s in the area of formal language theory, which is an area of theoretical computer science. I worked on it as a part-time URA research project for two terms during my undergrad at Waterloo. The contents of the paper are fairly technical. In this blog post, I will explain the background and motivation for the problem, and give a statement of the main result that we proved. ## What is Overlap Assembly? The subject of this paper is a formal language operation called overlap assembly, which models annealing of DNA strands. When you cool a solution containing a lot of short DNA strands, they tend to stick together in a predictable manner: they stick together to form longer strands, but only if the ends “match”. We can view this as a formal operation on strings. If we have two strings a and b such that some suffix of a matches some prefix of b, then the overlap assembly is the string we get by combining them together: Above: Example of overlap assembly of two strings In some cases, there might be more than one way of combining the strings together, for example, a=CATA and b=ATAG — then both CATATAG and CATAG are possible overlaps. Therefore, overlap assembly actually produces a set of strings. We can extend this idea to languages in the obvious way. The overlap assembly of two languages is the set of all the strings you get by overlapping any two strings in the respective languages. For example, if L1={ab, abab, ababab, …} and L2={ba, baba, bababa, …}, then the overlap language is {aba, ababa, …}. It turns out that if we start with two regular languages, then the overlap assembly language will always be regular too. I won’t go into the details, but it suffices to construct an NFA that recognizes the overlap language, given two DFAs recognizing the two input languages. Above: Example of construction of an overlap assembly NFA (Figure 2 of our paper) ## What is State Complexity? Given that overlap assembly is closed under regular languages, a natural question to ask is: how “complex” is the regular language that gets produced? One measure of complexity of regular languages is state complexity: the number of states in the smallest DFA that recognizes the language. State complexity was first studied in 1994 by Sheng Yu et al. Some operations do not increase state complexity very much: if two regular languages have state complexities m and n, then their union has state complexity at most mn. On the other hand, the reversal operation can blow up state complexity exponentially — it’s possible for a language to have state complexity n but its reversal to have state complexity $2^n$. Here’s a table of the state complexities of a few regular language operations: Over the years, state complexity has been studied for a wide range of other regular language operations. Overlap assembly is another such operation — the paper studies the state complexity of this operation. ## Main Result In our paper, we proved that the state complexity of overlap assembly (for two languages with state complexities m and n) is at most: $2(m-1) 3^{n-1} + 2^n$ Further, we constructed a family of DFAs that achieve this bound, so the bound is tight. That’s it for my not-too-technical summary of this paper. I glossed over a lot of the details, so check out the paper for the full story! # Learning R as a Computer Scientist If you’re into statistics and data science, you’ve probably heard of the R programming language. It’s a statistical programming language that has gained much popularity lately. It comes with an environment specifically designed to be good at exploring data, plotting visualizations, and fitting models. R is not like most programming languages. It’s quite different from any other language I’ve worked with: it’s developed by statisticians, who think differently from programmers. In this blog post, I describe some of the pitfalls that I ran into learning R with a computer science background. I used R extensively in two stats courses in university, and afterwards for a bunch of data analysis projects, and now I’m just starting to be comfortable and efficient with it. ## Why a statistical programming language? When I encountered R for the first time, my first reaction was: “why do we need a new language to do stats? Can’t we just use Python and import some statistical libraries?” Sure, you can, but R is very streamlined for it. In Python, you would need something like scipy for fitting models, and something like matplotlib to display things on screen. With R, you get RStudio, a complete environment, and it’s very much batteries-included. In RStudio, you can parse the data, run statistics on it, and visualize results with very few lines of code. Aside: RStudio is an IDE for R. Although it’s possible to run R standalone from the command line, in practice almost everyone uses RStudio. I’ll do a quick demo of fitting a linear regression on a dataset to demonstrate how easy it is to do in R. First, let’s load the CSV file: df <- read.csv("fossum.csv")  This reads a dataset containing body length measurements for a bunch of possums. Don’t ask why, it was used in a stats course I took. R parses the CSV file into a data frame and automatically figures out the dimensions and variable names and types. Next, we fit a linear regression model of the total length of the possum versus the head length: model <- lm(totlngth ~ hdlngth, df)  It’s one line of code with the lm function. What’s more, fitting linear models is so common in R that the syntax is baked into the language. Aside: Here, we did totlngth ~ hdlngth to perform a single variable linear regression, but the notation allows fancier stuff. For example, if we did lm(totlngth ~ (hdlngth + age)^2), then we would get a model including two variables and the second order interaction effects. This is called Wilkinson-Rogers notation, if you want to read more about it. We want to know how the model is doing, so we run the summary command: > summary(model) Call: lm(formula = totlngth ~ hdlngth, data = df) Residuals: Min 1Q Median 3Q Max -7.275 -1.611 0.136 1.882 5.250 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -28.722 14.655 -1.960 0.0568 . hdlngth 1.266 0.159 7.961 7.5e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.653 on 41 degrees of freedom Multiple R-squared: 0.6072, Adjusted R-squared: 0.5976 F-statistic: 63.38 on 1 and 41 DF, p-value: 7.501e-10  Don’t worry if this doesn’t mean anything to you, it’s just dumping the parameters of the models it fit, and ran a bunch of tests to determine how significant the model is. Lastly, let’s visualize the regression with a scatterplot: plot(df$hdlngth, df$totlngth) abline(model)  And R gives us a nice plot: All of this only took 4 lines of R code! Hopefully I’ve piqued your interest by now — R is great for quickly trying out a lot of different models on your data without too much effort. That being said, R has a somewhat steep learning curve as a lot of things don’t work the way you’d expect. Next, I’ll mention some pitfalls I came across. ## Don’t worry about the type system As computer scientists, we’re used to thinking about type systems, type casting rules, variable scoping rules, closures, stuff like that. These details form the backbone of any programming language, or so I thought. Not the case with R. R is designed by statisticians, and statisticians are more interested in doing statistics than worrying about intricacies of their programming language. Types do exist, but it’s not worth your time to worry about the difference between a list and a vector; most likely, your code will just work on both. The most fundamental object in R is the data frame, which stores rows of data. Data frames are as ubiquitous in R as objects are in Java. They also don’t have a close equivalent in most programming languages; it’s similar to a SQL table or an Excel spreadsheet. ## Use dplyr for data wrangling The base library in R is not the most well-designed package in the world. There are many inconsistencies, arbitrary design decisions, and common operations are needlessly unintuitive. Fortunately, R has an excellent ecosystem of packages that make up for the shortcomings of the base system. In particular, I highly recommend using the packages dplyr and tidyr instead of the base package for data wrangling tasks. I’m talking about operations you do to data to get it to be a certain form, like sorting by a variable, grouping by a set of variables and computing the aggregate sum over each group, etc. Dplyr and tidyr provide a consistent set of functions that make this easy. I won’t go into too much detail, but you can see this page for a comparison between dplyr and base R for some common data wrangling tasks. ## Use ggplot2 for plotting Plotting is another domain where the base package falls short. The functions are inconsistent and worse, you’re often forced to hardcode arbitrary constants in your code. Stupid things like plot(..., pch=19) where 19 is the constant for “solid circle” and 17 means “solid triangle”. There’s no reason to learn the base plotting system — ggplot2 is a much better alternative. Its functions allow you to build graphs piece by piece in a consistent manner (and they look nicer by default). I won’t go into the comparison in detail, but here’s a blog post that describes the advantages of ggplot2 over base graphics. It’s unfortunate that R’s base package falls short in these two areas. But with the package manager, it’s super easy to install better alternatives. Both ggplot2 and dplyr are widely used (currently, both are in the top 5 most downloaded R packages). ## How to self-study R First off, check out Swirl. It’s a package for teaching beginners the basics of R, interactively within RStudio itself. It guides you through its courses on topics like regression modelling and dplyr, and only takes a few hours to complete. At some point, read through the tidyverse style guide to get up to speed on the best practices on naming files and variables and stuff like that. Now go and analyze data! One major difference between R and other languages is that you need a dataset to do anything interesting. There are many public datasets out there; Kaggle provides a sizable repository. For me, it’s a lot more motivating to analyze data I care about. Analyze your bank statement history, or data on your phone’s pedometer app, or your university’s enrollment statistics data to find which electives have the most girls. Turn it into a mini data-analysis project. Fit some regression models and draw a few graphs with R, this is a great way to learn. The best thing about R is the number of packages out there. If you read about a statistical model, chances are that someone’s written an R package for it. You can download it and be up and running in minutes. It takes a while to get used to, but learning R is definitely a worthwhile investment for any aspiring data scientist. # Applying to Graduate School in Computer Science So you’re thinking of grad school. Four years of undergrad is not enough for you and you’re craving for more knowledge. Or perhaps you want to delay your entry into the “real world” for a couple more years. Well, grad school is the right place! About a year ago, I decided I wanted to do grad school. However, most of my peers were on track to work in the industry after graduation. The process of applying for grad school is daunting, especially since it varies from country to country and from one subject to another. This is why I am writing a guide to grad school applications in Computer Science, for Canada and the USA: a compilation of things I wish I knew a year ago. ### Why grad school? People go to grad school for different reasons. Most people I know in computer science and software engineering plan to start working full-time — a reasonable choice, given the high salaries in the industry right now. I figured that I had the rest of my life to code for a company; there’s no rush to start working immediately. Graduate programs come in three broad categories: 1. Coursework Master’s. Typically about 1 year, this is basically an extension of undergrad. You take a bunch of graduate courses, which are more in-depth than undergrad classes, but you don’t do any research. This is useful for gaining more knowledge before going to work in the industry. 2. Thesis Master’s. About 1-2 years, depending on the school. At first, you take some courses like in coursework master’s, but the primary goal is to produce a master’s thesis, which is equivalent to about one conference paper of original research. This is a good way to get some research experience, without the time commitment of a Ph.D (and is valuable as a stepping stone if you do decide to get one). 3. Ph.D. A longer commitment of 4-5 years. In a Ph. D., you produce an extensive amount of original research; by the time you write your thesis, you will be a world expert on your specific topic. I like this illustrated explanation of what it’s like to do a Ph. D. There are hybrid programs too, like thesis master’s often transition directly into a Ph. D, and also there are regional differences on how these programs work (more on this later). ### Can I get into grad school? As you may expect, top grad school programs are very competitive, and a typical grad student is a top student in his undergraduate class. So what do schools look for in their grad school admissions process? Grades are a big factor: generally, an undergrad GPA of 85% or higher is good for grad school (with 80% being the bare minimum). However, even more important than GPA is your research experience. Publishing papers in conferences would be ideal, and research experience can make up for a lackluster transcript. Unfortunately, Waterloo students are at a disadvantage here: with the co-op program, most people spend their undergrad years focusing on internships rather than research, which is considered less valuable. Don’t be discouraged though: my only research experience was through two part-time URA’s, and I have zero publications, but I still got into a good grad school. ### Picking a research area In grad school, you specialize on a specific area of computer science, for example, machine learning, or databases, or theory, or programming languages. You have to indicate roughly what area you want to study in your application, but it’s okay to not know exactly what you want to research. For me, I wanted to do something involving artificial intelligence or data science or machine learning. Eventually I decided on natural language processing (NLP), since it’s an area of machine learning, and I like studying languages. Some people have a specific professor that they want to work with, in which case it’s helpful to reach out to them beforehand and mention it in your statement of purpose. Otherwise, as in my case, you don’t need to explicitly contact potential advisers if you have nothing to say; you get to indicate your adviser preferences in your application. ### Choosing your schools The most important thing to look for in a grad school is the quality of the research group. You may be tempted to look at overall computer science rankings, but this can be misleading because different schools have strengths in different research areas. There are other factors to consider, like location, city environment (big city or college town), and social life. It’s a good idea to apply to a variety of schools of different levels of competitiveness. However, each application costs about$100, so it can be expensive to apply to too many — 5-10 applications is a good balance.

I decided to apply to five schools: two in Canada and three in the USA. My main criteria were (1), a reputable research program in NLP, and (2), I wanted to live in a big city. After some deliberation, I decided to apply to the following:

• Ph. D. at University of Washington
• Ph. D. at UC Berkeley
• Ph. D. at Columbia University
• M. Sc. at University of Toronto
• M. Sc. at University of British Columbia

I didn’t apply to the University of Waterloo, where I’m doing my undergrad, despite it being pretty highly ranked in Canada — after studying here for five years, I needed a change of scenery.

### Differences between Canada and USA

You might have noticed that my three applications in the USA were all Ph. D. programs, while my two in Canada were master’s. Graduate school works quite differently in Canada vs in the USA. In Canada, most students do a master’s after undergrad and then go on to do a Ph. D., but in the USA, you enter into Ph. D. directly after undergrad, skipping the master’s.

There are master’s programs in the USA too, but they are almost exclusively coursework master’s, and are very expensive ($50k+ tuition per year). In contrast, thesis master’s programs in Canada and Ph. D. programs in the USA are fully funded, so you get paid a stipend of around$20k-30k a year.

A big reason to do a master’s in the USA is for visa purposes: for Chinese and Indian citizens, getting the H1-B is much easier with a master’s in the country, so the investment can be worth it. Otherwise, it’s probably not worth getting a master’s in the USA; studying in Canada is much cheaper.

If you go to grad school in Canada, you can apply for the CGS-M and OGS government scholarships for master’s students. Unfortunately, Canadian citizens are ineligible for most scholarships if you study in the USA.

### Taking the GRE

Another difference for the USA is that the Graduate Record Exam (GRE) is required for all grad school admissions in the USA. This is a 4-hour-long computer-administered test with a reading, writing, and math component. If you’ve taken the SAT, this test is very similar. For grad school applications in computer science, only the general exam is required, and not the computer science subject test.

The GRE plays a fairly minor role in admissions: a terrible score will hurt your chances, but a perfect score will not help you that much. The quantitative and verbal sections are scored between 130-170, and for science and engineering programs, a good score is around 165+ for quantitative and 160+ for verbal.

The quantitative (math) section is a cakewalk for any computer science major, but the verbal section can be challenging if English is not your native language. It does require some preparation (1-6 months is recommended). I studied for a month and did quite well.

Applications are due on December 15 for most schools, so you should take the GRE in October at the latest (and earlier if you plan on taking it multiple times).

### Letters of Recommendation

Most grad school and scholarship applications require three letters of recommendation; out of all requirements, this one requires the most planning. The ideal recommendation comes from a professor that you have done research with. If you go to Waterloo and are considering grad school, doing a part-time URA (undergraduate research assistantship) is a good way to secure a few recommendation letters.

It may be difficult to find three professors that you’ve worked with, so the next best thing is a weaker letter from a professor whose class you did well in. As a last resort, at most one letter may come from a non-academic source (like your co-op supervisor). I was lucky that one of my research projects was co-supervised by two different professors, so I got two letters that way.

### Statement of Purpose

The statement of purpose is a two-page essay where you describe your academic history and research interests, and convince the admissions committee that you are the ideal candidate to admit. If you have internship experience, talk about what you learned any why it’s relevant for research.

Chances are that the first revision of your statement of purpose will suck (this was certainly the case for me), so get friends and professors to proofread it. After several revisions, here’s my final statement of purpose.