Lessons learned after 6 months of building a language learning startup

For the past six months, I’ve been working on LevelText, a startup that helps you learn languages by reading authentic native material. Actually, calling it a startup is a bit of a stretch because our app never produced any revenue and only a handful of users.

Nevertheless, I worked on it seriously for about half a year with a cofounder, joined an incubator, incorporated a company, iterated on an idea and launched several times. During this process, I learned a lot about the startup ecosystem, web development, talking to users, and many aspects of running a business. In this article, I will tell the story of how we got here and what we learned.

Long before LevelText, building a language learning app had been on my mind for a long time. I had always been passionate about languages: I studied several languages including French, Spanish and Japanese, and I was a PhD student researching linguistics and natural language processing. Thus, during my idle time, I liked to brainstorm ideas about improving language learning. Surely, language learning can be improved with my AI and NLP expertise, right?

There were a lot of language learning apps out there already, the most famous one being Duolingo, but there are many alternatives including Babbel, Rosetta Stone, Busuu, and dozens of others. Most people in the language learning community liked to criticize these apps: they offered similar features like language games and flashcards, teaching you words and phrases, but you could use them for a long time without achieving much fluency in the language.

According to many language acquisition researchers and educators, a more effective method of learning languages was “comprehensible input”: essentially getting a lot of exposure to authentic material (whether it’s books, articles, TV shows, podcasts, etc). The language is ideally at a level slightly above what you can easily understand, but not so difficult that it’s overwhelming. This was in stark contrast to all the apps that taught language through unnatural games and exercises, thus, despite the crowded marketplace, I was confident that teaching languages using authentic reading material would be a revolutionary app idea.

Iteration 1: Learning languages through reading

Personally, I like to learn languages by reading books. My process was the following: while reading, I would mark all the words that I didn’t understand, then I would look them up in a dictionary and add them to my collection to be reviewed later. I was improving my Chinese at the time, and found this process quite tedious because inputting unknown logographic characters into a dictionary was nontrivial.

So my first idea was an app where you take a picture of a printed text where you highlighted some words, and it would use some computer vision to look up these words for you in a dictionary. Then, if you liked, you could use this information to create Anki flashcards to review later. We called the tool VocabAssist and my wife helped me implement the prototype.

At this time, I started to interview all of my friends who were into language learning. A few people agreed with me that reading in the target language was a pain point, but most people did not have physical books lying around, and preferred reading on their phone or computer. And people who owned books wanted to keep them clean rather than writing all over them.

Therefore, we redesigned the concept: the user would instead paste text into the app. The app would then use some NLP to decide which words you probably don’t know, given your language level, and look up their definitions in a dictionary for you. Yet a problem still remained: how were users supposed to find material to read? Making them copy/paste text from web pages would have been too clumsy.

After doing a bit of market research, we found two existing apps with a similar concept already: LingQ and Readlang. These two apps did a decent job of helping learners study from the text, with fully featured dictionaries and flashcards, and they even supported video. However, they had to already have some material on hand; otherwise, both had libraries of user-contributed content to read, but they were small and you were unlikely to find what you’re looking for if you had niche interests. It would’ve been much better if they could leverage all of the content available on the Internet.

Both my wife and I had other full-time responsibilities, so progress was slow. Neither of us knew much about web development (I was a decent backend engineer but knew very little frontend), so the app was barely functional. As a result, we never launched the app to the public and only demoed it to some friends. Before we could launch it for real, I would need a cofounder with more experience building websites, and probably handle business responsibilities as well.

Iteration 2: Search engine for French learners

At the end of 2021, I had just moved to Vancouver and was wrapping up my PhD; I had more free time and was ready to give this project a more serious effort. So I set up a profile on the YCombinator cofounder matching platform, and soon I matched with Jonny Kalambay. He was working as a software engineer a Yelp and also had a passion for language learning. We worked on a toy project as a trial of working together and then decided to become cofounders.

Jonny was a talented full-stack engineer who could build and launch prototypes within days. Following the advice of the books “Running Lean” and “The Mom Test”, we started to interview users of our target demographic who were interested in language learning, asking them how they learn languages, what were the biggest struggles, and what tools they wish existed.

It turned out that finding interesting reading material of an appropriate difficulty was a fairly common problem. Thus, LevelText was born. We decided to build a search engine tool for French learners: we would take the user’s search query, translate it into French, search the internet for articles about the topic, and finally, return a sorted list of the articles ranked by estimated reading difficulty.

Our MVP assumed the user spoke English and was studying French. We decided to support only one language in our initial launch, since this would allow us to test out the concept with minimal development. We picked French because it was the most widely studied language in Canada and both of us were familiar with it.

After launching it on ProductHunt and conducting user tests on some French learners, the response was lukewarm. The tool was novel and cool, but didn’t provide enough value for them to keep coming back to the site. Intermediate and advanced learners already had lists of places to find reading material, and most articles on the internet were too difficult for beginners. We didn’t address the problem of motivation, one of the biggest struggles for language learners, and depended on the user remembering to use our site.

From the business side, the idea was difficult to monetize. There are several options for making money from a consumer-facing software product. Placing advertisements is the lowest effort option, but requires a massive amount of traffic to produce sufficient revenue: possible for a social network, but usually not for language learning apps. Most language learning apps make money by charging monthly or annual subscriptions. But we would need to provide a lot more value to justify charging a subscription – a simple tool wouldn’t cut it.

Iteration 3: Daily bite-sized reading

Our idea to fix the motivation problem was instead of the user needing to make searches, why not send them articles to their email? They would only need to indicate their interests once during sign-up and then they would receive a lesson every day in their inbox. With the lesson prepared and ready to go, the friction needed to start studying would be much lower.

Each lesson consisted of a paragraph of simple-to-read French text accessible to beginners, and a multiple-choice question to test their comprehension. We started prototyping this concept, using the GPT-3 model from OpenAI to simplify French articles down to a simple paragraph, and manually writing the multiple choice questions. We then enrolled a handful of French learners from social language learning sites to validate the idea.

The results were promising: most of the learners opened the e-mail and answered the question for three days in a row, although we never tried to charge money for it (the first real test of product-market fit), and the amount of manual work to prepare lessons meant that we could not get a large sample size of users to experiment with. Also, our mini-lessons could be completed in under a minute, which we felt was too little content for people to pay for it.

Founder troubles

It was around this time that we were invited to interview with YCombinator. This was a big deal because getting in meant receiving $500,000 USD of investment and connections to a network of experienced founders and mentors. We didn’t expect to get this far since our product hadn’t demonstrated traction yet; I guess it was our founder credentials that landed us the interview. We spent a day or two rehearsing our pitch and practicing answering questions for the rapid-fire 10-minute interview.

The next day, we received a rejection letter from YCombinator, basically saying that many great founders have tried to break into the language learning apps market, but it is hard to build a business in this space. Language learning (especially for non-English languages) was a hobby for most people with no urgent need, and they urged us to consider solving a must-have rather than a nice-to-have problem.

Our team morale took a hit after the rejection, and truth be told, we had gotten lots of hints at this point that language learning was not a great business direction. However, language learning was what we had been doing from the beginning, and we had no idea what else we could pivot to. Jonny suggested switching to making software for language teachers, because this seemed like a less competitive space than language learning apps. But I wasn’t thrilled with this direction: neither of us had connections or experience in education, so we had no good understanding of the problems that teachers faced and the realities of their daily working environment.

We started to fight more and more over product and business decisions: what level of language learners to aim at, how much to rely on automation versus manual content curation, how much we should care about monetization, etc. Neither of us were more experienced than the other in business, and neither of us trusted the other to take the lead on it. About two weeks after YCombinator, Jonny decided to leave LevelText and start his own company to make tools for language teachers. To be fair, he agreed to leave me with full ownership of the company and all its intellectual property (i.e., the code that we wrote), but I was on my own.

Iteration 4: Personalized courses from native content

Up to this point, we had talked to many users and ran some experiments, but we had never actually tried to charge money for our service. In this iteration, my plan was to take what I had learned thus far and build a product that at least had a chance of succeeding. By “success”, I mean making at least minimum wage, or roughly $3000 CAD/month.

Occasionally, you hear of startups raising investment from the founders’ credentials alone, without any revenue, but we talked to some investors and found this to be mostly not the case. Only incubators would invest on the order of $100k in early stage startups with no revenue, but we thought this was too little to unlock any options that we can’t already do, like hiring engineers. All the bigger investors expected to see monthly recurring revenue of at least $5-10k/month before our request for a $500k seed round would be taken seriously. So investment or no investment, I would need to start making that much revenue in the near future.

My business model was to bundle lessons into courses, which students could buy at a fixed cost. Several successful companies sold educational content bundled into courses, such as LinkedIn Learning and Chessable: the idea being that if most users drop off after a few weeks, we make more money by charging a one-time course purchase cost at the beginning than from a subscription (which they would quickly cancel). Another advantage of the course model was that I only needed to produce a fixed amount of content, whereas if I sold subscriptions, I would need to continuously generate new content for as long as they are subscribed, an unattractive proposition given that the app was not fully automated and a lot of work still needed to be done manually.

With a price point of $20 per course, I would need to sell just over 100 courses per month to make minimum wage: an optimistic, but not an impossible quantity. I built a landing page and a form where users input which topics they wanted to read about. The student then receives one trial lesson for free, at which point they would need to buy a course to receive more lessons.

To prepare a lesson, I would pick a French article of around B1 difficulty on your desired topic, and run an algorithm to guess the most difficult French vocabulary in the article and provide English translations. Additionally, the lesson included a computer text-to-speech reading and an English translation of the full article. For $20, I would prepare 7 daily French lessons matching your interests.

I launched the platform, bought some traffic from Google Adwords, but to my disbelief, nobody bought the courses. To figure out what was wrong, I conducted another round of user interviews. Again, the responses revealed that we were far from product market fit.

Everybody complained that the pricing was too expensive for what we provided: other digital services like Spotify and Netflix cost a similar amount or lower, and you got a lot more. People also found the pricing model confusing and assumed it was $20/month for a subscription. They also complained that our app only helped you practice reading, with no exercises or anything to help you with speaking, listening or writing. Worst of all, users didn’t find much value in personalized lessons, which was supposed to be our unique selling point – they rated lessons catered to their personal interests about equally interesting as lessons about the default topics.

Reflecting on this feedback, I think what happened makes a lot of sense. Most users, when asked to provide a list of their interests, gave fairly generic responses like “sports” or “politics”: not very useful for personalization. I wanted to make at least minimum wage to justify all the effort I was putting into this project, so I worked backwards from the $3000/month revenue target and designed the pricing scheme to achieve this goal. But my users don’t care about me or how much money I wanted to make, they only saw that I didn’t offer much for $20, so of course they didn’t buy.

I gathered the user feedback and sketched out some ideas of how to improve the product for the next iteration. But honestly, I was running out of steam – after countless iterations and little to show for my effort, it became difficult to gather the focus and willpower to implement more features. Being a solo founder is not much fun: nobody else understood my problems or cared about my language learning startup. So I decided it was a good time to quit.

Why language learning is a poor business

Despite my passion for it, language learning is ultimately a weak business model, especially software-based apps. For English speakers, learning a language is mainly something that is done for fun; it is seldom a must-have. Learners typically get bored and quit after only a few weeks, when their initial motivation runs out and it becomes a chore. This makes it difficult to make a subscription-based model work (many language learning apps get around this by asking for payment of several months or a whole year upfront).

Most language learners don’t spend much money on their hobby – out of all the people I interviewed, hardly anybody had ever paid for an app. It’s hard to blame them – there are tons of free language learning resources on the internet, so there is rarely a compelling reason to justify taking out your credit card. Even when people do pay to learn a language, it is usually for human tutoring or physical books (which people expect to pay for), not software products.

Finally, the market is saturated with hundreds of language learning products offering similar features. There is no good way to measure results of language learning, so you have no way to prove that your method is better than the others in terms of educational efficacy. With no better alternative, most companies optimize for user retention, inevitably leading to an app filled with gamification features.

Was founding a startup the wrong choice for me?

There are several reasons why being a founder in this space was probably not the right choice for me personally. First of all, my main expertise is in AI: while useful, it is far from the biggest priority in the early stages of most consumer products. Language learning apps needed to be engaging and fun to use, and AI was at most a minor component. Before product-market fit, the main tasks in a startup are developing quick prototypes of web or mobile apps, networking on social media to find users, talking to those users, and aligning the product with their needs. As for AI, it is usually best to throw something together quickly with GPT-3 and improve it later if there is demand for it, because any more sophisticated machine learning would take too long relative to the benefits.

While I could probably learn web development, sales, and product management given enough time, I had no experience in any of these skills, meaning I have no competitive advantage over the thousands of other aspiring entrepreneurs, and would be at a disadvantage compared to founders with experience in these skills.

Another factor is personal risk tolerance: how long are you willing to work full-time on your startup without any revenue or investment? For me, I was willing to put six months. This short timeline drove me to only consider business models that would generate a couple thousand dollars a month of revenue right away, and kill the idea if this goal could not be reached. But in reality, it is common to iterate for several years before achieving this goal (eg: Readlang took about 3 years to reach $1000/month in revenue). Since I wasn’t prepared to risk several years, potentially with zero payout at the end, it would’ve probably been better to work on it on the side while working a regular job somewhere, instead of going all-in full-time with an overly optimistic timeline.

There is a common belief in the startup community that persistence in your startup is “good” and giving up is “bad”. Part of this is observation bias: you only read about successful companies that raise millions and get acquired, after a lot of failure and persistence, and never hear about the majority of startups that fail. This has led many talented young people to pursue the startup dream, often willing to fail at it for years. I don’t think this is a particularly noble act: there is nothing magical that happens when you incorporate a company and declare yourself to be a startup founder. Your “company” is only worth something if you can provide something of sufficient value to customers that they pay for it; if you have no revenue then you are basically no different from being unemployed (even if you write a lot of code and get a few people to use it).

Even though I didn’t succeed in my venture, this was a valuable learning experience. You learn much more by actually attempting a startup than just reading about them; the life of a founder is not as glamorous as portrayed in the media. Now I know what I don’t know, and have a better idea of what skills I’d need to have a better chance next time. That’s the end of my rather long post, I hope it will be useful to readers thinking about building a startup or a language learning app.

Real World Applications of Automaton Theory

This term, I was teaching a course on intro to theory of computation, and one of the topics was finite automatons — DFAs, NFAs, and the like. I began by writing down the definition of a DFA:

A deterministic finite automaton (DFA) is a 5-tuple (Q, \Sigma, \delta, q_0, F) where: Q is a finite set of states…

I could practically feel my students falling asleep in their seats. Inevitably, a student asked the one question you should never ask a theorist:

“So… how is this useful in real life?”

DFAs as a model of computation

I’ve done some theoretical research on formal language theory and DFAs, so my immediate response was why DFAs are important to theorists.

1.png

Above: A DFA requires O(1) memory, regardless of the length of the input.

You might have heard of Turing machines, which abstracts the idea of a “computer”. In a similar vein, regular languages describe what is possible to do with a computer with very little memory. No matter how long the input is, a DFA only keeps track of what state it’s currently in, so it only requires a constant amount of memory.

By studying properties of regular languages, we gain a better understanding of what is and what isn’t possible with computers with very little memory.

This explains why theorists care about regular languages — but what are some real world applications?

DFAs and regular expressions

Regular expressions are a useful tool that every programmer should know. If you wanted to check if a string is a valid email address, you might write something like:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Behind the scenes, this regular expression gets converted into an NFA, which can be quickly evaluated to produce an answer.

You don’t need to understand the internals of this in order to use regular expressions, but it’s useful to know some theory so you understand its limitations. Some programmers may try to use regular expressions to parse HTML, but if you’ve seen the Pumping Lemma, you will understand why this is fundamentally impossible.

DFAs in compilers

In every programming language, the first step in the compiler or interpreter is the lexer. The lexer reads in a file of your favorite programming language, and produces a sequence of tokens. For example, if you have this line in C++:

cout << "Hello World" << endl;

The lexer generates something like this:

IDENTIFIER  cout
LSHIFT      <<
STRING      "Hello World"
LSHIFT      <<
IDENTIFIER  endl
SEMICOLON   ;

The lexer uses a DFA to go through the source file, one character at a time, and emit tokens. If you ever design your own programming language, this will be one of the first things you will write.

2.gif

Above: Lexer description for JSON numbers, like -3.05

DFAs for artificial intelligence

Another application of finite automata is programming simple agents to respond to inputs and produce actions in some way. You can write a full program, but a DFA is often enough to do the job. DFAs are also easier to reason about and easier to implement.

The AI for Pac-Man uses a four-state automaton:

3.png

Typically this type of automaton is called a Finite State Machine (FSM) rather than a DFA. The difference is that in a FSM, we do an action depending on the state, whereas in a DFA, we care about accepting or rejecting a string — but they’re the same concept.

DFAs in probability

What if we took a DFA, but instead of fixed transition rules, the transitions were probabilistic? This is called a Markov Chain!

4.jpg

Above: 3 state Markov chain to model the weather

Markov chains are frequently used in probability and statistics, and have lots of applications in finance and computer science. Google’s PageRank algorithm uses a giant Markov chain to determine the relative importance of web pages!

You can calculate things like the probability of being in a state after a certain number of time steps, or the expected number of steps to reach a certain state.


In summary, DFAs are powerful and flexible tools with myriad real-world applications. Research in formal language theory is valuable, as it helps us better understand DFAs and what they can do.

Learning R as a Computer Scientist

If you’re into statistics and data science, you’ve probably heard of the R programming language. It’s a statistical programming language that has gained much popularity lately. It comes with an environment specifically designed to be good at exploring data, plotting visualizations, and fitting models.

R is not like most programming languages. It’s quite different from any other language I’ve worked with: it’s developed by statisticians, who think differently from programmers. In this blog post, I describe some of the pitfalls that I ran into learning R with a computer science background. I used R extensively in two stats courses in university, and afterwards for a bunch of data analysis projects, and now I’m just starting to be comfortable and efficient with it.

Why a statistical programming language?

When I encountered R for the first time, my first reaction was: “why do we need a new language to do stats? Can’t we just use Python and import some statistical libraries?”

Sure, you can, but R is very streamlined for it. In Python, you would need something like scipy for fitting models, and something like matplotlib to display things on screen. With R, you get RStudio, a complete environment, and it’s very much batteries-included. In RStudio, you can parse the data, run statistics on it, and visualize results with very few lines of code.

Aside: RStudio is an IDE for R. Although it’s possible to run R standalone from the command line, in practice almost everyone uses RStudio.

I’ll do a quick demo of fitting a linear regression on a dataset to demonstrate how easy it is to do in R. First, let’s load the CSV file:

df <- read.csv("fossum.csv")

This reads a dataset containing body length measurements for a bunch of possums. Don’t ask why, it was used in a stats course I took. R parses the CSV file into a data frame and automatically figures out the dimensions and variable names and types.

Next, we fit a linear regression model of the total length of the possum versus the head length:

model <- lm(totlngth ~ hdlngth, df)

It’s one line of code with the lm function. What’s more, fitting linear models is so common in R that the syntax is baked into the language.

Aside: Here, we did totlngth ~ hdlngth to perform a single variable linear regression, but the notation allows fancier stuff. For example, if we did lm(totlngth ~ (hdlngth + age)^2), then we would get a model including two variables and the second order interaction effects. This is called Wilkinson-Rogers notation, if you want to read more about it.

We want to know how the model is doing, so we run the summary command:

> summary(model)

Call:
lm(formula = totlngth ~ hdlngth, data = df)

Residuals:
   Min     1Q Median     3Q    Max
-7.275 -1.611  0.136  1.882  5.250 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -28.722     14.655  -1.960   0.0568 .
hdlngth        1.266      0.159   7.961  7.5e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.653 on 41 degrees of freedom
Multiple R-squared:  0.6072,	Adjusted R-squared:  0.5976
F-statistic: 63.38 on 1 and 41 DF,  p-value: 7.501e-10

Don’t worry if this doesn’t mean anything to you, it’s just dumping the parameters of the models it fit, and ran a bunch of tests to determine how significant the model is.

Lastly, let’s visualize the regression with a scatterplot:

plot(df$hdlngth, df$totlngth)
abline(model)

And R gives us a nice plot:

1.png

All of this only took 4 lines of R code! Hopefully I’ve piqued your interest by now — R is great for quickly trying out a lot of different models on your data without too much effort.

That being said, R has a somewhat steep learning curve as a lot of things don’t work the way you’d expect. Next, I’ll mention some pitfalls I came across.

Don’t worry about the type system

As computer scientists, we’re used to thinking about type systems, type casting rules, variable scoping rules, closures, stuff like that. These details form the backbone of any programming language, or so I thought. Not the case with R.

R is designed by statisticians, and statisticians are more interested in doing statistics than worrying about intricacies of their programming language. Types do exist, but it’s not worth your time to worry about the difference between a list and a vector; most likely, your code will just work on both.

The most fundamental object in R is the data frame, which stores rows of data. Data frames are as ubiquitous in R as objects are in Java. They also don’t have a close equivalent in most programming languages; it’s similar to a SQL table or an Excel spreadsheet.

Use dplyr for data wrangling

The base library in R is not the most well-designed package in the world. There are many inconsistencies, arbitrary design decisions, and common operations are needlessly unintuitive. Fortunately, R has an excellent ecosystem of packages that make up for the shortcomings of the base system.

In particular, I highly recommend using the packages dplyr and tidyr instead of the base package for data wrangling tasks. I’m talking about operations you do to data to get it to be a certain form, like sorting by a variable, grouping by a set of variables and computing the aggregate sum over each group, etc. Dplyr and tidyr provide a consistent set of functions that make this easy. I won’t go into too much detail, but you can see this page for a comparison between dplyr and base R for some common data wrangling tasks.

Use ggplot2 for plotting

Plotting is another domain where the base package falls short. The functions are inconsistent and worse, you’re often forced to hardcode arbitrary constants in your code. Stupid things like plot(..., pch=19) where 19 is the constant for “solid circle” and 17 means “solid triangle”.

There’s no reason to learn the base plotting system — ggplot2 is a much better alternative. Its functions allow you to build graphs piece by piece in a consistent manner (and they look nicer by default). I won’t go into the comparison in detail, but here’s a blog post that describes the advantages of ggplot2 over base graphics.

It’s unfortunate that R’s base package falls short in these two areas. But with the package manager, it’s super easy to install better alternatives. Both ggplot2 and dplyr are widely used (currently, both are in the top 5 most downloaded R packages).

How to self-study R

First off, check out Swirl. It’s a package for teaching beginners the basics of R, interactively within RStudio itself. It guides you through its courses on topics like regression modelling and dplyr, and only takes a few hours to complete.

At some point, read through the tidyverse style guide to get up to speed on the best practices on naming files and variables and stuff like that.

Now go and analyze data! One major difference between R and other languages is that you need a dataset to do anything interesting. There are many public datasets out there; Kaggle provides a sizable repository.

For me, it’s a lot more motivating to analyze data I care about. Analyze your bank statement history, or data on your phone’s pedometer app, or your university’s enrollment statistics data to find which electives have the most girls. Turn it into a mini data-analysis project. Fit some regression models and draw a few graphs with R, this is a great way to learn.

The best thing about R is the number of packages out there. If you read about a statistical model, chances are that someone’s written an R package for it. You can download it and be up and running in minutes.

It takes a while to get used to, but learning R is definitely a worthwhile investment for any aspiring data scientist.

AI Project: Harmonizing Pop Melodies using Hidden Markov Models

It is often said that all pop music “sound the same”. If this is the case, then a computer program should be able to compose pop music, right?

This is the problem we studied for the AI final project (CS486) — kind of. We restrict to the simpler problem of harmonizing an existing melody: in other words, given a fixed melody, find a sequence of chord progressions that match it.

We formulated the task as a probabilistic optimization problem, and devised an algorithm to solve it using Hidden Markov Models (HMMs). Then, we used the Music21 library to generate MIDI files from the output so that we can listen to it.

In the experiment, we chose the melody from If Only by JJ Lin. Here’s a demo:

Music Theory Model

In our model, pop music consists of two parts: a melody line consisting of a sequence of individual notes, and a harmony line of a sequence of chords. Generally, the melody line is the notes sung by the lead vocalist, and the harmony line is played by accompanying instruments (piano, guitar, digital synthesizers). Conventionally, the chord progression is written as a sequence of chord names above the melody; the exact notes in a chord progression is up to the performer’s discretion.

1.pngAbove: Example of two-part song with melody and chord progression

It is hard to quantify exactly what make a chord progression “good”, since music is inherently subjective and depends on an individual’s musical tastes. However, we capture the notion of “pleasant” using music theory, by assigning a penalty to musical dissonance between a note in the melody and a note in the chord.

According to music theory, the minor second, major second, and tritone (augmented fourth) intervals are dissonant (including the minor and major seventh inversions). Therefore, our program tries to avoid forming these intervals by assigning a penalty.

2.pngAbove: List of dissonant intervals. All other intervals are consonant.

We assume that all notes in the melody lie on either a major or minor scale in some fixed key. The set of permissible chords for a scale are the major and minor chords where all of its constituent notes are in the scale. For example, in the scale of G major, the permissible chords are {G, Am, Bm, C, D, Em}.

This is a vastly simplified model that does not capture more nuanced chords that appear in pop music. However, even the four-chord progression G – D – Em – C is sufficient for many top-40 pop songs.

First attempt: Harmonize each bar independently

The first thing we tried was to looking at each bar by itself, and searching through all of the permissible chords to find the chord that harmonizes best with the bar.

In other words, we define the penalty of the chord for a melody bar is the number of times a melody note forms a dissonant interval with a chord note, weighted by the duration of the note. Then, we assign to each bar the chord with the lowest penalty.

Here’s what we get:

m2.pngAbove: Naive harmonization. Notice that the Am chord is repeated 3 times.

This assigns a reasonable-sounding chord for each bar on its own, but doesn’t account for the transition probabilities between chords. We end up with the A minor chord repeated 3 times consecutively, which we’d like to avoid.

How to account for chord transitions, while still harmonizing with the melody in each bar? This is where we bring in Hidden Markov Models.

Second attempt: HMMs and the Viterbi Algorithm

Hidden Markov Models are a type of probabilistic graphical model that assumes that there is an unobservable hidden state that influences the output variable. In each timestep, the hidden state transitions to a different state, with probabilities given by a transition matrix.

A standard problem with HMMs is: given a sequence of observations, find the most likely sequence of hidden states that produced it. In our problem, the hidden states are the unknown sequence of chords, and the observed sequence is the bars of our melody. Now, solving the most likely sequence problem yields the chord progression that best harmonizes with the melody.

4.png

To encode the rule that we don’t like having the same chord consecutively for multiple bars, we assign the transitions so that the probability of transition from any chord to itself is low. The transition probability to every other chord is equal.

3.pngAbove: Transition probabilities for chords (only 3 shown here, but this generalizes easily).  A chord is equally likely to transition to any other chord, but unlikely to transition to itself.

The only thing left to do is solve the most likely sequence problem. We used a modified version of the Viterbi algorithm to do this — it solves the problem quickly using dynamic programming.

Running the Viterbi algorithm on our test melody:

m3.pngAbove: Using HMMs to eliminate repeated chords

And there we go — no more repeated chords! It doesn’t produce the same chord sequence as my manual harmonization, but sounds quite natural (and doesn’t violate any musical constraints).

Results and Future Directions

Using HMMs, we were able to harmonize a short melody segment. By looking at the entire melody as a whole, the HMM algorithm is able to produce a better chord progression than by optimizing every bar locally. Our implementation uses a greatly simplified musical model: it assumes the melody is in G major, and only considers 6 basic major and minor chords, but it successfully demonstrates a proof of concept.

When implementing this project, we thought this was new research (existing literature on harmonization focused mostly on classical music, not pop music). Alas, after some more literature review, it turns out that a Microsoft research team developed the same idea of using HMMs for melody harmonization, and published a series of papers in 2008. Oh well. Not quite publishable research but still a cool undergraduate class project.

The source code for this project is available on Github. Thanks to Michael Tu for collaborating with me!

If you enjoyed this post, check out my previous work on algorithmic counterpoint!

CS488 Final Project: OpenGL Boat Game

Here’s something I’ve been working on for the past few weeks for one of my courses, CS488 – Intro to Computer Graphics. For the final project, you’re allowed to do any OpenGL or raytracing project, as long as it has 10 reasonable graphics related objectives. Here’s a video of mine:

A screenshot:

It’s a simple game where you control a boat and go around a lake collecting coins. When you collect a coin, there’s a bomb that spawns and follows you around. You die when you hit a bomb. Also if two bombs collide then they both explode (although you can’t see that in the video).

Everything is implemented in bare-metal OpenGL, so none of those modern game engines or physics engines. It’s around 1000-ish lines of C++ (difficult to count because there’s a lot of donated code).

Edit (8/10/2016) – I received an Honorable Mention for this project!

CS488 – Introduction to Computer Graphics

For those that haven’t heard about CS488, it’s one of the “big three” — fourth year CS courses with the heaviest workload and with large projects (the other two being Real-time and Compilers). It’s one of the hardest courses at Waterloo, but also probably the most rewarding and satisfying course I’ve taken.

There are four assignments, each walking you step by step through graphics techniques, like drawing a cube with OpenGL, or building a puppet with hierarchical modelling, or writing a simple ray tracer. Then there’s the final project, where you can choose to make something with OpenGL or extend your ray tracer. The class is split 50/50, about half the class did OpenGL and the other half did a ray tracer. I personally feel that OpenGL gives you more room to be creative and create something unique whereas ray tracing projects end up implementing a mix of different algorithms.

The first two assignments weren’t too bad (I estimate it took me about 10 hours each), but some time during assignment 3 I realized I was spending a lot of time in the lab, so I got an hours tracking app on my phone to track exactly how much time I was spending working on this course. Assignments 3 and 4 each took me 15 hours. I spent 35 hours on my final project, over a period of 3 weeks. I chose relatively easy objectives that I was confident I could do well, which left time to polish the game and do a few extra objectives. I’m not sure what the average is for time spent on the final project, but it’s common to spend 50-100 hours. Bottom line: you can put in potentially unbounded amounts of time to try to get the gold medal, but the effort actually required to get a good grade is quite reasonable.

Now the bad part about this course (obviously not the instructor’s fault) is OpenGL is so incredibly difficult to work with. Even to draw a line on the screen, you have to deal with a lot of low level concepts like vertex array objects, vertex buffer objects, uniform attributes to pass to shaders, stuff like that. It doesn’t help that when something goes wrong in a shader (which runs on the GPU), there’s no way to pass an error message back to the CPU so you can print out variables and debug it. It also doesn’t help that there’s a lot of incompatible OpenGL versions, and code you find in an online tutorial could be subtly broken for the version you’re using. On the other hand, working with OpenGL really makes you appreciate modern game engines like Unity which takes care of all the low level stuff for you.

Achievement Unlocked: publish app on iOS App Store without testing it on a device

This week marks the end of a hobby project I’ve been working on for the last few months. It’s called WATisRain, and here’s a link to github. Initially an Android app, I ported the app to iOS over a period of two months. Yesterday, the app was approved on the app store, you can download it here.

Some background

I was never an Apple person. I do not own a Macbook, iPod, iPhone, or any Apple device.

My first mobile platform is Android. In this post I talk about my first impressions as an Android developer. That’s a story for another day.

Last year I started work on an app to navigate the tunnels between buildings of my university campus. The network of buildings, tunnels, and bridges were not very well known, even among upper years, so I figured it would be a cool idea for an Android app.

I worked on this idea on and off for a few months, then released version 1.0 to Google Play. The app quickly got about 2000 downloads and a couple dozen positive reviews. I was pretty happy.

An obvious next step to take is port this to iOS. The campus population is split between Android and iOS, so an Android-only app locks out a significant fraction of the user base. Unfortunately, I didn’t build my app with any of the cross-platform technologies, so this involves porting the entire codebase (2k lines of Java) into objective-C. I also didn’t have a Macbook or iPhone, both of which happens to be pretty crucial for iOS development.

Few months later, I landed an internship at Minted, in San Francisco. My company lent me a Macbook Pro, so I finally have the hardware to work on an iOS port. I still didn’t have an iPhone, but no matter. Surely the simulator is sufficient, right?

Motivated by a hard deadline (I had to return my Macbook at the end of my internship), I worked evenings and weekends to finish the iOS port. I ignored all coding conventions and translated my Java code, literally line by line, into objective-C.

It only took a few weeks to port over all the features and get it on the app store. I called a few of my friends who had iPhones, and asked them to download my app. They confirmed the app works. Mission accomplished.

Impressions on iOS development

My overall impression on iOS development so far is mixed.

I’m impressed with the technical aspects of Apple’s products, from the iPhone devices themselves to the IDE, Xcode, that Apple provides for developers. Compared to Android development, I was faced with far fewer random IDE glitches, inconsistencies between devices, and the like. Developing on the simulator worked amazingly well — enough to get me to the app store. For comparison, it would be unimaginable to develop the same Android app entirely on the emulator.

What I really disliked is the closed and proprietary approach Apple takes for its products. First of all, you need a Mac of some sort to develop for iOS, period. I can happily develop Android on any platform I want, but I cannot run Xcode on Windows.

Next, you need to enroll in Apple’s developer program, at a cost of $119 per year. At the end of the year, if you don’t renew your membership, your app is removed from the store. Even if you just want to develop for fun, without submitting to the app store, you still need this license to push your app to your device. In contrast, Google Play charges a $25 lifetime fee for the same thing.

One last thing I have to mention is the app review process takes 1-2 weeks. This is incredibly frustrating, since any bugfix will take a week to push to users.

In practice all these factors combined leads to a high barrier of entry for a hobbyist like me. Let’s calculate. If you spend $2500 on a Macbook Pro, $500 for some sort of iPhone, $119 for the developer program, that’s already over 3000 dollars before you can even start coding.

Can you really develop without a device?

All across internet forums, people advocate that you should test your app thoroughly across many devices before submitting to the app store. It seems that trying to develop without a device is an edge case, often the instructions for a task assumes you have a device, and you have to find a workaround if you don’t.

In my case, it was successful, in the sense that I produced an app that didn’t crash and got past app review. But I don’t know if I got lucky, because things could have turned out badly.

Throughout the whole process, I was worried that running the app on a real device would exhibit bugs that aren’t producible in the simulator. In that case, I’d have no way to debug the problem, and the project would be done. My app uses nothing but the most basic functionality, so I had a good chance of dodging this bullet. But still, the possibility loomed over me, threatening to kill the project just as it crosses the finish line.

A second problem is by copying my original Android app feature by feature, the resulting iOS app looks and feels like an Android app. A friend pointed this out when I sent him the app. I hadn’t noticed it, but after looking at some other iOS apps, I have to agree with him. Actually in hindsight it shouldn’t surprise anyone that without seeing other iOS apps, I don’t really know how an iOS app should behave. But it just never occurred to me that the natural way to do things for me might be unnatural for iOS users.

Finally, this is subjective, but for me it wasn’t very fun to develop for a simulator. Without the tactile sensation of your creation running on an actual phone, the whole experience feels detached from reality. You feel like an unwelcome foreigner in a country where the customs are different, and you begin to question yourself, why am I doing this iOS port anyway?

Part of what kept me going was sunk cost fallacy. I paid $119 to be an iOS developer, so I’d better get at least something on the app store, have something to show for it.

Now that the app is finished, I think I’m done with iOS development. Perhaps the app store is fertile ground for developers and startups looking to make a profit, but the cost of entry is unreasonable for someone making a few open source apps for fun.

What’s the hardest bug you’ve ever debugged?

In a recent interview, I was asked this question: “what’s the most difficult bug you’ve encountered, and how did you fix it?” I thought this was an interesting question because there are so many answers you could give to this question, and the sort of answer you give demonstrated your level of experience with developing software.

I thought for a moment, recalling all the countless bugs I had seen and fixed. Which one was the most difficult and interesting? In this article I’m going to describe my most difficult bug to date.

It was an iOS app. I was working as a four-month intern at the time. “We’ve been seeing reports from our users that the app randomly display a black screen,” my boss explained one afternoon. “No error message, no crash log, nothing. The app is simply stuck at a black screen state until you kill it.”

“Fair enough. How do I reproduce it?”

He shrugged. “I don’t know. Users are reporting it happens randomly. Here’s what you gotta do: grab an iPad, download the game off the app store. Create an account and play the game until you hit the bug.”

So I did. I was reduced to one of these typewriter monkeys, banging away mindlessly at the keyboard until I stumble upon the sequence of button presses to trigger the undiscovered bug by sheer coincidence.

For an afternoon I monkeyed away, but no matter what buttons I pressed, the mythical black screen would not appear. I left the office, defeated and mentally exhausted.

The next morning I checked into the office, picked up the iPad, and resumed my monkeying. But this time my fortune was different: within 15 minutes, lo and behold, the screen flashed white, followed by an unrepentant screen of black.

What did I do to trigger this? I retraced my steps, trying to repeat the miracle. It happened again. Methodically I searched for a deterministic sequence of actions that brought our app to its knees. Go to the profile page. Hit button X. Go to page Y and back to the profile page. Hit button Z. The screen flickered for a millisecond, the black. Ten times out of ten.

With a sigh of relief, I jotted down this strange choreography and went for a walk. Returning with a fresh mind ready to tackle the next stage of the problem, I executed the sequence one more time, just to make sure. But the bug was nowhere to be seen.

I racked my brain for an explanation. The same sequence of actions now produced different results, I reasoned. Which meant something must have changed. But what?

It occurred to me that the page looked a little different now from when I was able to reproduce the bug. In the morning, when I came in, there was a little countdown timer in the corner of the screen that indicated the time until an upcoming event. The timer was not there anymore. Could it be the culprit…

To test this hypothesis, I produced a build that pointed the game to the dev server, and fired up a system event. The timer appeared. I executed my sequence — profile, tap, home screen, back to profile, tap, and sure enough, with a flicker the black screen appeared. I turned off the timer, repeated the sequence — profile, tap, home, profile, tap — no black screen. I had finally discovered the heart of the matter. There was some strange interaction going on between the timer and other things on the page.

At this point, with 100% reproducibility, the worst was over. It took a few more hours for me to investigate the issue and come up with a fix. My patch was quickly rolled out to production, and users stopped complaining about random black screens. Then my team went out for some celebratory beer.

I will now describe exactly what happened — and why did a timer cause such an insidious bug.

The timer widget was implemented using an NSTimer which made a callback every second. To do this, the timer holds a reference to the parent view which contains it. This is not too unusual, and is generally innocent and harmless — until you combine it with Objective C’s garbage collection system.

Objective C’s garbage collection system uses a reference counting algorithm. I’ll remind you what this means. The garbage collector maintains, for each object, a count of how many references lead to it. When this reference count reaches zero, it means your object is dead, since there is no way to reach it from anywhere in the system. Thus the garbage collector is free to delete it.

This doesn’t work for NSTimer, though. When two objects hold references to each other, their reference count remain at least 1, which means they can never get garbage collected. In our app, this meant that whenever the view with the timer goes out of view, it doesn’t get disposed, but remains in the background forever. A memory leak.

A memory leak, by itself, can go unnoticed for a long time with no impact. The last part of the puzzle that brought everything crashing down had to do with the way a certain button was implemented. This button, when pressed, broadcasted a message, which would then be received by the profile view.

When the timer is active, it is possible to get the system into a state with two profile views — a real one and a zombie one kept alive by a reference cycle with the timer.

Then when the message is broadcasted, both the real and zombie views receive the message in parallel. The button logic is executed twice in rapid succession, which understandably causes the whole system to give in.

With this mechanism in mind, the fix was easy. Just invalidate the timer when the view goes out of view. Without the reference cycle, the profile view is disposed of correctly and all is well again.

I think this story demonstrates a fundamental truth about debugging: in order to debug effectively, you need to have a deep understanding of your technology stack. This is not always true of programming in general — quite often you can write code that works yet not really understand what it’s doing. When developing a feature in an unfamiliar technology, the typical workflow is, if you don’t know how to do something, copy something similar from StackOverflow or a different part of your code base, make some changes until it works. And that’s a fine way to do things.

But debugging requires a more structured methodology. When many things are breaking in haphazard ways, you need to narrow down the problem to its very core, to identify precisely which component is broken. In this case it was a reference cycle that wouldn’t get released. The core of the problem may be buried within layers upon layers of an API, even an API you believe to be bulletproof. It might require digging into assembly code, even hardware.

To find that core requires an understanding of a mind-boggling stack of technologies that software today sits upon. That’s what it takes to become a master debugger.

So, what’s the hardest bug you’ve ever debugged?

Algorithmic Trading Hackathon

The name of the hackathon was Code B: UW Algorithmic Trading Competition. It was hosted by Bloomberg and various UW student groups. It’s a 17 hour hackathon where you “create the best trading platform completely from scratch”. As far as I know, this is the first time the hackathon has been run, and in this article I’m going to write about my experience.

11053550_361293904063033_5062748457639600040_o

We were allowed teams of up to three, but my roommate Andrei and I signed up as a team of two. Like myself, Andrei is also a CS major. Neither of us had any experience with trading stocks, or anything finance related, for that matter. When asked to choose a team name, we named ourselves team /dev/rand (implying that we were so bad that we’d be no better than a random number generator)

The hackathon was scheduled to start Friday evening, running through the night until noon the next day. The goal was to write a program to autonomously trade stocks over a 20 minute period, battling other programs to earn as much money as possible. The programs communicated by connecting to a central server on Bloomberg’s side, so we could use any programming language we wanted. It was decided that Andrei would come up with strategies, and I would implement them in Python.

Rules of the Game

The specifics of the API and mechanics of the game were not revealed until the official start of the hackathon. The 50-60 teams packed into an auditorium as the organizers started to explain the technical details.

The rules turned out to be fairly simple. The only actions allowed were to bid (attempt to purchase) on a stock for some price, or ask (attempt to sell) a stock for some price. If at any point someone’s bid is higher than someone else’s ask, the deal goes through and the stock changes hands.

Now all of this was fairly standard, but after this part, the rules diverged from real life. In order to encourage people to buy stocks (and not just hoard the initial money), each share of a stock paid dividends to its owner every second. And to prevent simply buying one stock and holding it for the entire duration, the dividends given out quickly diminishes the longer you own the stock.

This quirky dividends system turned out to be central to our strategy. Additionally, the differences from real stock markets meant that any previous experience with finance and stock trading was less useful — definitely a good thing for us because many of our competitors were seriously studying finance and we had no experience anyway.

And it begins!

After the rules presentation, the hackathon kicked off. It was slightly past 7pm, and very quickly you could see teams buying and selling stocks. We decided to take it slow, discussing strategies over dinner.

We started work around 8pm. I began writing code to parse the input, while Andrei worked on deciphering the rather cryptic specifications document. Although API specs were clear enough, they were (intentionally) vague about how the system behaved behind the scenes. There were many formulas with lots of variables, many of which we had no idea what they meant.

So we took an experimental approach. Tentatively we put in a bid for a few shares of Google stock — and our net worth immediately took a nosedive. But the stock rapidly generated dividends, and before long, our net worth recovered to what it was initially, and it kept on going up! The success was short lasted, however, as quickly the dampening effects of the dividends started to kick in, and our rate of return quickly diminished to near zero.

We tried again, buying a few shares of the Twitter stock. The same thing happened — our value went down, quickly recovered, then gradually leveled to 50 dollars more than we started with.

With this information, we formulated a rough strategy. We didn’t know how to predict which stocks will go up; neither did we have a plan for buying and selling stocks at a favorable rate. Instead, we would take advantage of a stock’s “golden period”, where the stock initially pays massive dividends. It was crucial to buy as quickly as possible, since the clock started ticking as soon as you own one share of the stock. So we use all our money to buy as many shares of the stock as possible, doesn’t matter what price. Now we wait as the golden period payout multiplied by our entire bankroll makes us rich. Then, a few minutes later, when the golden period runs out, we would slowly sell, iteratively lowering our asking price until we found a buyer.

Once we sold the last share of a stock, the dividend clock doesn’t immediately reset, it slowly regenerates. So if we wait a while, say 5 minutes, then buy back the stock, we get another brief golden period. Taking this one step further, we decided on a strategy that cycled through the 10 stocks: at any given point, we would hold at most 4 of them, while the other 6 were left to “recharge”.

I proceeded to code the algorithm, while Andrei analyzed the spec document and brainstormed ways to improve the strategy. From the equations in the spec, he came up with a formula to determine what stock generated the highest dividends. Every half hour, the scoreboard would reset, and by 3am, I was basically done, and our algorithm consistently came either first or second by the end of each round. Our algorithm worked beautifully, simultaneously juggling a bunch of different stocks, some buying, others slowly selling. We watched the scoreboard as we earned hundreds of dollars every minute, ending with a ridiculous amount of money by the time it reset.

It seemed at this point that a lot of the teams were having implementation issues, like connecting to the network and parsing input, and only a handful were making any money at all, so I was pretty happy with our results.

But at 4am, disaster struck. A new round started, and our algorithm instantly plummeted to the bottom of the leaderboard. Every time we bought or sold anything, we lost money, and none of it was coming back through dividends. What happened?? It turns out that the parameters were changed, so that a very low amount of dividends were paid for owning a stock, and the only profits were made by buying low and selling high. This meant that our whole strategy, which centered around maximizing dividends, was rendered useless.

What’s worse — I discovered a bug in my implementation where our stocks were not being cycled properly: it would sell a certain stock, then instantly buy back the same stock, which didn’t allow the dividends clock to reset, meaning no dividends. Also, by this point a lot of teams were flooding the network with requests, making any network call have a small chance of throwing an exception and crashing the whole thing entirely.

The network problem was easy to fix, but at 5am, I was really tired and had difficulty tracking down the bug that was causing it to buy back the same stock. Andrei suggested a new set of strategies for the “low dividends” scenario, but by now, I was too tired to implement another set of strategies. Instead, we tweaked various constants in our program to make it play more patiently and more predictably, so even in the worst case it would make marginal gains instead of finishing dead last. After 2 hours of debugging, we managed to track down the cycling bug.

It was 7am and I could hardly keep my eyes open so we found a couch and napped for two hours, until the mock competition began.

Mock Competition and final tweaking

At 9am, a few hours before the final competition, there was a mock competition which was meant to be identical to the final competition. There were three rounds: a high dividends scenario, a low dividends, and one in the middle.

We won the high dividends round hands down, unsurprisingly as our entire strategy was designed around this set of parameters in mind. In the low dividends round, we didn’t do as well, but thanks to careful tweaking, we still made a modest amount of money, coming in fifth. In the medium round, we got second place. This was enough to win the mock competition.

Now, let me give you a summary of our competition. Most of the teams increased gradually in net worth, with their score slowly increasing as they slowly accumulated dividends. We were confident that we could play the dividends game, so it didn’t trouble us too much. What was really troubling was a team called “vlad” (I don’t quite remember what their name was, but it ended with vlad). Instead of gradually gaining money a few dollars at a time, “vlad” remained at a constant net worth for a long time, then suddenly gain hundreds of dollars instantly. This meant that their algorithm operated completely differently from ours, and we had absolutely no explanation of what was going on.

It didn’t help that the formula for net worth was complicated and we didn’t fully understand it. Our net worth clearly increased when we did well, but it fluctuated wildly, sometimes dipping by hundreds of points when we made a large transaction, only to bounce back when dividends started rolling in.

The next few hours were fairly unproductive, since we had no more ideas on how to improve our algorithm. Although Andrei had some ideas on strategies for the low dividends game, after pulling an all-nighter I was in no shape to try implementing them.

The Final Game

It was soon time for the final competition, the cumulation of all our efforts. Having carefully noted down the parameters for the mock competition, we were ready to use this information to get every edge we could for the finals.

Round 1 was high dividends. We played with a highly aggressive set of parameters, dumping our bad stocks for very cheap in pursuit of the dividends regeneration. The early game was contentious, but by the 10 minute mark, we gained a solid lead over the competition and maintained the lead until the end. We won round 1, with “vlad” coming in third place.

Round 2 was low dividends. We deployed the patient strategy, which was less eager to dump anything, holding onto bad stocks until we get a good price for them, since there were little dividends to fight over anyway. We came fifth place, with “vlad” coming in fourth.

Round 3 was medium dividends. We started off uncertain — at the halfway mark we were still in the middle of the pack — but slowly we gained ground, and five minutes before the end, we were in third position. “vlad” was in first place, with a big enough lead that neither we nor the second place team were going to overtake him. But at this point, we knew that from our points in the first round, we only needed second place to beat “vlad” and win the competition — and with 3 minutes left on the clock, we overtook the second place team. We were going to win it!

Then, the whole scoreboard goes black.

It didn’t crash, no, it was the contest organizers’ tactic to increase suspense so the final winners are not known until the winners are announced. We waited anxiously as the final seconds ticked down, the organizers announcing fourth place, third place, UI award. We just needed second place in this round to win, if we get third place in this round, “vlad” beats us by a hair.

And the second place goes to… team /dev/rand. What? We stared in disbelief as we realize we lost to “vlad”.

Going home

Turns out that in the last 2 minutes of competition, we got overtaken by not one, but two teams. So we actually finished round 3 in fourth place.

Our prize for winning second place? A playstation 4 (worth ~450) and a parrot drone (worth ~100), and most importantly, the satisfaction of winning a finance competition without knowing the first thing about finance. Team “vlad” got two playstations and a drone (well, they could have taken all 3 playstations but they were nice enough to leave us one)

Big thanks to all the organizers and volunteers for keeping everything running smoothly!

If you’re interested, our source code is in a git repo here. It’s 400 lines of hackathon-level-bad python code.

What about real algo trading?

A natural question to ask is, can we get rich IRL with this algorithm? Answer is clearly no — we essentially gamed the system by greedily grabbing the golden period of dividends, a mechanic designed to encourage people to buy and sell stocks. Of course, in the real world, dividends don’t work like that.

Then other than this mechanic, how else is this competition different from real world algo trading? Unfortunately, I don’t know enough about this topic to answer that question.

Philosophically, I still don’t understand how it’s possible that they basically pull money out of thin air. I mean, a stock trader doesn’t intrinsically create value for society, but they get rich doing it? I don’t know.

Visualizing Quaternions with Unity

How do you model the position and orientation of an airplane?

Position is easy, just represent it with a point in 3D space. But how do you specify its orientation — which direction it’s pointing?

At first glance, it seems a vector will do. After all, a vector points in some direction, right? If the plane is pointing east, represent its orientation by a unit vector pointing east.

Unfortunately, we quickly run into trouble when we try to roll. If we’re facing east, and we roll 90 degrees, we’re still facing east. Clearly we’re missing something.

Euler Angles

When real pilots talk about their orientation, they talk about roll, yaw, pitch. Pitch is going up or down, yaw is going left or right, roll is, well, roll.

Any change in orientation can be described by some combination of roll, yaw, pitch. This is the basis for Euler Angles. We use three angles to represent the airplane’s orientation.

This is all fine and dandy if we want to represent the orientation of a static object in space. But when we try to adjust our orientation, we start to run into problems.

You’re thinking, this should be simple! When we turn left or right, we just increment the yaw variable, right? Yes, it seems to work, at least initially. You can turn left and right, up and down, and roll around.

Implement it in Unity and play around a bit, however, and you begin to notice that things don’t quite behave the way you expect.

In this animation, I’m holding down the right button:

The plane does rotate to the right, but it’s not rotating relative to itself. Instead it’s rotating around some invisible y-axis. If it was rotating relative to itself, the green arrow shouldn’t be moving.

The problem becomes more and more severe when the pitch of the plane becomes higher and higher. The worst case is when the airplane is pointing straight up: then roll and yaw become the same thing! This is called gimbal lock: we have lost a degree of freedom and we can only rotate in 2 dimensions! Definitely not something desirable if we’re controlling a plane or spaceship.

It turns out that no matter what we do, we will suffer from some form of gimbal lock. As long as we use Euler Angles, there is one direction where if we turn too far, everything starts to screw up.

Practical Introduction to Quaternions

All is not lost, however. There is a way to represent orientation that represents all axes equally and does not suffer from gimbal lock. This mythical structure is called the quaternion. Unlike Euler Angles which describe your orientation relative to a fixed set of axes, quaternions do not rely on any fixed axis.

The drawback is that quaternions are unintuitive to understand for humans. There is no way to “look” at a quaternion and be able to visualize what rotation it represents. Fortunately for us, it’s not that difficult to make use of quaternions, even if we can’t visualize quaternions.

There is a lot of theory behind how quaternions work, but in this article, I will gloss over the theory and give a quick primer to quaternions, just the most common facts you need to use them. At the same time, I will implement the operations I describe in C#, so I can integrate them with Unity. If you don’t know C#, you can freely ignore the code.

Definition

A quaternion is an ordered pair of 4 real numbers (w,x,y,z). We write this as

w+xi+yj+zk

The letters i,j,k are not variables. Rather, they are independent axes. If you like, you can think of the quaternions as a 4 dimensional vector space.

The defining property of quaternions is:

i^2 = j^2 = k^2 = ijk = -1

Play around with it a bit and you can derive 6 more identites:

ij = k

jk = i

ki = j

ji = -k

kj = -i

ik = -j

If you’ve worked with complex numbers, this should seem familiar. Instead of 2 parts of a complex number (the real and imaginary parts), we have 4 parts for a quaternion.

The similarity doesn’t end here. Multiplying complex numbers represents a rotation in 2 dimensions. Similarly, multiplying by a quaternion represents a rotation in 3D.

One curious thing to note: we have ij=k and ji=-k. We switched around the terms and the product changed. This means that multiplying quaternions is kind of like multiplying matrices — the order matters. So multiplication is not commutative.

Here’s a framework for a quaternion in C#:

public class Quat{
	// Represents w + xi + yj + zk
	public float w, x, y, z;
	public Quat(float w, float x, float y, float z){
		this.w = w;
		this.x = x;
		this.y = y;
		this.z = z;
	}
}

Normalizing Quaternions

The norm of a quaternion is

N(\mathbf{q}) = \sqrt{w^2+x^2+y^2+z^2}

When we use quaternions to represent rotations, we typically want unit quaternions: quaternions with norm 1. This is straightforward: to normalize a quaternion, divide each component by the norm.

In C#:

public float Norm(){
  return Mathf.Sqrt (w * w + x * x + y * y + z * z);
}

public Quat Normalize(){
  float m = Norm ();
  return new Quat (w / m, x / m, y / m, z / m);
}

Multiplying Quaternions

Multiplying is simple, just a little tedious. If we have two quaternions:

(w_1 + x_1i + y_1j + z_1k) (w_2+x_2i+y_2j+z_2k)

Then their product is this ugly mess:

\begin{array}{l} w_1w_2-x_1x_2-y_1y_2-z_1z_2 \\ + (w_1x_2+x_1w_2+y_1z_2-z_1y_2)i \\ + (w_1y_2+y_1w_2-x_1z_2+z_1x_2) j \\ + (w_1z_2+z_1w_2+x_1y_2-y_1x_2) k \end{array}

In C#:

// Returns a*b
public static Quat Multiply(Quat a, Quat b){
  float w = a.w * b.w - a.x * b.x - a.y * b.y - a.z * b.z;
  float x = a.w * b.x + a.x * b.w + a.y * b.z - a.z * b.y;
  float y = a.w * b.y + a.y * b.w - a.x * b.z + a.z * b.x;
  float z = a.w * b.z + a.z * b.w + a.x * b.y - a.y * b.x;
  return new Quat (w,x,y,z).Normalize();
}

Since multiplication is not commutative, I made this function static to avoid confusing left and right multiplication. Also, I normalize the product so that floating point errors don’t accumulate.

Constructing Rotation Quaternions

Every rotation operation can be written as a rotation of some angle, \theta, around some vector (u_x, u_y, u_z):

The following formula gives a quaternion that represents this rotation:

\mathbf{q} = \cos \frac{\theta}{2} + (u_x i + u_y j + u_z k) \sin \frac{\theta}{2}

For our purposes, \theta is a very small number, say 0.01, and we use one of the three basis vectors to rotate around. For example, if we are rotating around (1,0,0) then our quaternion is

\cos \frac{0.01}{2} + \sin \frac{0.01}{2}i

That’s it: given any quaternion, multiplying on the left by our quaternion rotates it slightly around the x axis.

In C#, our code might look like this:

Quat qx = new Quat (Mathf.Cos (0.01 / 2), 0, 0, Mathf.Sin (0.01 / 2));
Quat qy = new Quat (Mathf.Cos (0.01 / 2), 0, Mathf.Sin (0.01 / 2), 0);
Quat qz = new Quat (Mathf.Cos (0.01 / 2), Mathf.Sin (0.01 / 2), 0, 0);

Putting it together

That’s all we need to do interesting things with quaternions. Let’s combine everything we have. Here’s our quaternion class thus far:

public class Quat{
	// Represents w + xi + yj + zk
	public float w, x, y, z;
	public Quat(float w, float x, float y, float z){
		this.w = w;
		this.x = x;
		this.y = y;
		this.z = z;
	}

	public float Norm(){
		return Mathf.Sqrt (w * w + x * x + y * y + z * z);
	}

	public Quat Normalize(){
		float m = Norm ();
		return new Quat (w / m, x / m, y / m, z / m);
	}

	// Returns a*b
	public static Quat Multiply(Quat a, Quat b){
		float w = a.w * b.w - a.x * b.x - a.y * b.y - a.z * b.z;
		float x = a.w * b.x + a.x * b.w + a.y * b.z - a.z * b.y;
		float y = a.w * b.y + a.y * b.w - a.x * b.z + a.z * b.x;
		float z = a.w * b.z + a.z * b.w + a.x * b.y - a.y * b.x;
		return new Quat (w,x,y,z).Normalize();
	}

	public Quaternion ToUnityQuaternion(){
		return new Quaternion (w, x, y, z);
	}
}

Now we just need to read the input, perform our calculations, and output the rotation quaternion to Unity:

public class Airplane : MonoBehaviour {
  public GameObject airplane;
  public Quat quat = new Quat (0, 0, 0, -1);
  public float speed = 0.01f;

  void FixedUpdate(){
    float inputX = Input.GetAxis("UpDown");
    float inputY = Input.GetAxis("LeftRight");
    float inputZ = Input.GetAxis("Roll");

    Quat qx = new Quat (Mathf.Cos (speed * inputX / 2), 0, 0, Mathf.Sin (speed * inputX / 2));
    Quat qy = new Quat (Mathf.Cos (speed * inputY / 2), 0, Mathf.Sin (speed * inputY / 2), 0);
    Quat qz = new Quat (Mathf.Cos (speed * inputZ / 2), Mathf.Sin (speed * inputZ / 2), 0, 0);

    quat = Quat.Multiply (qx, quat);
    quat = Quat.Multiply (qy, quat);
    quat = Quat.Multiply (qz, quat);

    airplane.transform.rotation = quat.ToUnityQuaternion ();
  }
}

In Unity, the input is not given to us as a single true/false value, but a float between -1 and 1. So holding right increases the LeftRight input gradually until it reaches 1, avoiding a sudden jump in movement.

What’s ToUnityQuaternion? Well, it turns out that Unity already has a Quaternion class that does everything here and much more, so all this could have literally been implemented in one line if we wanted.

Anyways, let’s see the result.

As you can see, holding right turns the plane relative to itself now, and the green arrow stays still. Hooray!

Beginner’s comparison of Computer Algebra Systems (Mathematica / Maxima / Maple)

I’ve never been very good at doing manual computations, and whenever I need to do a tedious computation for an assignment, I like to automate it by writing a computer program. Usually I implemented an ad-hoc solution using Haskell, either using a simple library or rolling my own implementation if the library didn’t have it. But I found this solution to be unsatisfactory: my Haskell programs worked with integers and floating numbers and I couldn’t easily generalize it to work with symbolic expressions. So I looked to learn a CAS (computer algebra system), so in the future I won’t have to hack together buggy code for common math operations.

I have no experience with symbolic computing, so it wasn’t clear to me where to begin. To start off, there are many different competing computer algebra systems, all incompatible with each other, and it’s far from clear which one is best for my needs. I began to experiment with several systems, but after a few days I still couldn’t decide which one was the winner.

I narrowed it down to 3 platforms. Here’s my setup (all running on Windows 7):

  • Mathematica 8.0
  • Maxima 5.32 with wxMaxima 13.04
  • Maple 18.00

So I came up with a trial — I had a short (but nontrivial) problem representative of the type of problem I’d be looking at, and I would try to solve it in all 3 languages, to determine which one was easiest to work with.

The Problem

This problem came up as a part of a recent linear algebra assignment.

Let the field be \mathbb{Z}_5 (so all operations are taken modulo 5). Find all 2×2 matrices P such that

P^T \left( \begin{array}{cc} 2 & 0 \\ 0 & 3 \end{array} \right) P = I

We can break this problem into several steps:

  • Enumerate all lists of length 4 of values between 0 to 4, that is, [[0,0,0,0],[0,0,0,1],…,[4,4,4,4]]. We will probably do this with a cartesian product or list comprehension.
  • Figure out how to convert a list into a 2×2 matrix form that the system can perform matrix operations on. For example, [1,2,3,4] might become matrix([1,2],[3,4])
  • Figure out how to do control flow, either by looping over a list (procedural) or with a map and filter (functional)
  • Finally, multiply the matrices modulo 5 and check if it equals the identity matrix, and output.

This problem encompasses a lot of the challenges I have with CAS software, that is, utilize mathematical functions (in this case, we only use matrix multiplication and transpose), yet at the same time express a nontrivial control flow. There are 5^4=625 matrices to check, so performance is not a concern; I am focusing on ease of use.

For reference, here is the answer to this problem:

These are the 8 matrices that satisfy the desired property.

I have no prior experience in programming in any of the 3 languages, and I will try to solve this problem with the most straightforward way possible with each of the languages. I realize that my solutions will probably be redundant and inefficient because of my inexperience, but it will balance out in the end because I’m equally inexperienced in all of the languages.

Mathematica

I started with Mathematica, a proprietary system by Wolfram Research and the engine behind Wolfram Alpha. Mathematica is probably the most powerful out of the three, with capabilities with working with data well beyond what I’d expect from a CAS.

What I found most jarring about Mathematica is its syntax. I’ve worked with multiple procedural and functional languages before, and there are certain things that Mathematica simply does differently from everybody else. Here are a few I ran across:

  • To use a pure function (equivalent of a lambda expression), you refer to the argument as #, and the function must end with the & character
  • The preferred shorthand for Map is /@ (although you can write the longhand Map)
  • To create a cartesian product of a list with itself n times, the function is called Tuples, which I found pretty counterintuitive

Initially I wanted to convert my flat list into a nested list by pattern matching Haskell style, ie f [a,b,c,d] = [[a,b],[c,d]], but I wasn’t sure how to do that, or if the language supports pattern matching on lists. However I ran across Partition[xs,2] which does the job, so I went with that.

Despite the language oddities, the functions are very well documented, so I was able to complete the task fairly quickly. The UI is fairly streamlined and intuitive, so I’m happy with that. I still can’t wrap my head around the syntax — I would like it more if it behaved more like traditional languages — but I suppose I’ll get the hang of it after a while.

Here’s the program I came up with:

SearchSpaceLists := Tuples[Range[0, 4], 4]
SearchSpaceMatrices := 
 Map[Function[xs, Partition[xs, 2]], SearchSpaceLists]
Middle := {{2, 0}, {0, 3}}
FilteredMatrices := 
 Select[SearchSpaceMatrices, 
  Mod[Transpose[#].Middle.#, 5] == IdentityMatrix[2] &]
MatrixForm[#] & /@ FilteredMatrices

Maxima

Maxima is a lightweight, open source alternative to Mathematica; I’ve had friends recommend it as being small and easy to use.

The syntax for Maxima is more natural, with things like lists and loops and lambda functions working more or less the way I expect. However, whenever I tried to do something with a function that isn’t the most common use case, I found the documentation lacking and often ended up combing through old forum posts.

Initially I tried to generate a list with a cartesian product like my Mathematica version, but I couldn’t figure out how to do that, eventually I gave up and used 4 nested for loops because that was better documented.

Another thing I had difficulty with was transforming a nested list into a matrix using the matrix command. Normally you would create a matrix with matrix([1,2],[3,4]), so by passing in two parameters. The function doesn’t handle passing in matrix([[1,2],[3,4]]), so to get around that you need to invoke a macro: funmake(‘matrix,[[1,2],[3,4]]).

Overall I found that the lack of documentation made the system frustrating to work with. I would however use it for simpler computations that fall under the common use cases — these are usually intuitive in Maxima.

Here’s the program I came up with:

Middle:matrix([2,0],[0,3]);
Ident:identfor(Middle);
for a:0 thru 4 do
  for b:0 thru 4 do
    for c:0 thru 4 do
      for d:0 thru 4 do
        (P:funmake('matrix,[[a,b],[c,d]]),
         P2:transpose(P).Middle.P,
         if matrixmap(lambda([x],mod(x,5)),P2) = Ident then
             print(P));

Shortly after writing this I realized I didn’t actually need the funmake macro, since there’s no need to generate a nested list in the first place, I could simply do matrix([a,b],[c,d]). Oh well, the point still stands.

Maple

Maple is a proprietary system developed by Maplesoft, a company based in Waterloo. Being a Waterloo student, I’ve had some contact with Maple: professors used it for demonstrations, some classes used it for grading. Hence I felt compelled to give Maple a shot.

At first I was pleasantly surprised that matrix multiplication in a finite field was easy — the code to calculate A*B in \mathbb{Z}_5 is simply A.B mod 5. But everything went downhill after that.

The UI for Maple feels very clunky. Some problems I encountered:

  • It’s not clear how to halt a computation that’s in a an infinite loop. It doesn’t seem to be possible within the UI, and the documentation suggests it’s not possible in all cases (it recommends manually terminating the process). Of course, this loses all unsaved work, so I quickly learned to save before every computation.
  • I can’t figure out how to delete a cell without googling it. It turns out you have to select your cell and a portion of the previous cell, then hit Del.
  • Copy and pasting doesn’t work as expected. When I tried to copy code written inside Maple to a text file, all the internal formatting and syntax highlighting information came with it.
  • Not an UI issue, but error reporting is poor. For example, the = operator works for integers, but when applied to matrices, it silently returns false. You have to use Equals(a,b) to compare matrices (this is kind of like java).

In the end, I managed to complete the task but the poor UI made the whole process fairly unpleasant. I don’t really see myself using Maple in the future; if I had to, I would try the command line.

Here’s the program I came up with:

with(LinearAlgebra):
with(combinat, cartprod):
L := [seq(0..4)]:
T := cartprod([L, L, L, L]):
Middle := <2,0;0,3>:
while not T[finished] do
  pre_matrix := T[nextvalue]();
  matr := Matrix(2,2,pre_matrix);
  if Equal(Transpose(matr).Middle.matr mod 5, IdentityMatrix(2)) then
    print(matr);
  end if
end do:

Conclusion

After the brief trial, there is still no clear winner, but I have enough data to form some personal opinions:

  • Mathematica is powerful and complete, but has a quirky syntax. It has the most potential — definitely the one I would go with if I were to invest more time into learning a CAS.
  • Maxima is lightweight and fairly straightfoward, but because of lack of documentation, it might not be the best tool to do complicated things with. I would keep it for simpler calculations though.
  • Maple may or may not be powerful compared to the other two, I don’t know enough to compare it. But its UI is clearly worse and it would take a lot to compensate for that.

I’ve posted this discussion on some discussion boards for Mathematica and Maple, you might find them informative.