AI Project: Harmonizing Pop Melodies using Hidden Markov Models

It is often said that all pop music “sound the same”. If this is the case, then a computer program should be able to compose pop music, right?

This is the problem we studied for the AI final project (CS486) — kind of. We restrict to the simpler problem of harmonizing an existing melody: in other words, given a fixed melody, find a sequence of chord progressions that match it.

We formulated the task as a probabilistic optimization problem, and devised an algorithm to solve it using Hidden Markov Models (HMMs). Then, we used the Music21 library to generate MIDI files from the output so that we can listen to it.

In the experiment, we chose the melody from If Only by JJ Lin. Here’s a demo:

Music Theory Model

In our model, pop music consists of two parts: a melody line consisting of a sequence of individual notes, and a harmony line of a sequence of chords. Generally, the melody line is the notes sung by the lead vocalist, and the harmony line is played by accompanying instruments (piano, guitar, digital synthesizers). Conventionally, the chord progression is written as a sequence of chord names above the melody; the exact notes in a chord progression is up to the performer’s discretion.

1.pngAbove: Example of two-part song with melody and chord progression

It is hard to quantify exactly what make a chord progression “good”, since music is inherently subjective and depends on an individual’s musical tastes. However, we capture the notion of “pleasant” using music theory, by assigning a penalty to musical dissonance between a note in the melody and a note in the chord.

According to music theory, the minor second, major second, and tritone (augmented fourth) intervals are dissonant (including the minor and major seventh inversions). Therefore, our program tries to avoid forming these intervals by assigning a penalty.

2.pngAbove: List of dissonant intervals. All other intervals are consonant.

We assume that all notes in the melody lie on either a major or minor scale in some fixed key. The set of permissible chords for a scale are the major and minor chords where all of its constituent notes are in the scale. For example, in the scale of G major, the permissible chords are {G, Am, Bm, C, D, Em}.

This is a vastly simplified model that does not capture more nuanced chords that appear in pop music. However, even the four-chord progression G – D – Em – C is sufficient for many top-40 pop songs.

First attempt: Harmonize each bar independently

The first thing we tried was to looking at each bar by itself, and searching through all of the permissible chords to find the chord that harmonizes best with the bar.

In other words, we define the penalty of the chord for a melody bar is the number of times a melody note forms a dissonant interval with a chord note, weighted by the duration of the note. Then, we assign to each bar the chord with the lowest penalty.

Here’s what we get:

m2.pngAbove: Naive harmonization. Notice that the Am chord is repeated 3 times.

This assigns a reasonable-sounding chord for each bar on its own, but doesn’t account for the transition probabilities between chords. We end up with the A minor chord repeated 3 times consecutively, which we’d like to avoid.

How to account for chord transitions, while still harmonizing with the melody in each bar? This is where we bring in Hidden Markov Models.

Second attempt: HMMs and the Viterbi Algorithm

Hidden Markov Models are a type of probabilistic graphical model that assumes that there is an unobservable hidden state that influences the output variable. In each timestep, the hidden state transitions to a different state, with probabilities given by a transition matrix.

A standard problem with HMMs is: given a sequence of observations, find the most likely sequence of hidden states that produced it. In our problem, the hidden states are the unknown sequence of chords, and the observed sequence is the bars of our melody. Now, solving the most likely sequence problem yields the chord progression that best harmonizes with the melody.


To encode the rule that we don’t like having the same chord consecutively for multiple bars, we assign the transitions so that the probability of transition from any chord to itself is low. The transition probability to every other chord is equal.

3.pngAbove: Transition probabilities for chords (only 3 shown here, but this generalizes easily).  A chord is equally likely to transition to any other chord, but unlikely to transition to itself.

The only thing left to do is solve the most likely sequence problem. We used a modified version of the Viterbi algorithm to do this — it solves the problem quickly using dynamic programming.

Running the Viterbi algorithm on our test melody:

m3.pngAbove: Using HMMs to eliminate repeated chords

And there we go — no more repeated chords! It doesn’t produce the same chord sequence as my manual harmonization, but sounds quite natural (and doesn’t violate any musical constraints).

Results and Future Directions

Using HMMs, we were able to harmonize a short melody segment. By looking at the entire melody as a whole, the HMM algorithm is able to produce a better chord progression than by optimizing every bar locally. Our implementation uses a greatly simplified musical model: it assumes the melody is in G major, and only considers 6 basic major and minor chords, but it successfully demonstrates a proof of concept.

When implementing this project, we thought this was new research (existing literature on harmonization focused mostly on classical music, not pop music). Alas, after some more literature review, it turns out that a Microsoft research team developed the same idea of using HMMs for melody harmonization, and published a series of papers in 2008. Oh well. Not quite publishable research but still a cool undergraduate class project.

The source code for this project is available on Github. Thanks to Michael Tu for collaborating with me!

If you enjoyed this post, check out my previous work on algorithmic counterpoint!

A Simple Shorthand Musical Notation

Anyone who’s played piano, or any other musical instrument, would be familiar with the “standard” musical notation. It’s clear, unambiguous, accepted worldwide, and has been basically unchanged since Bach. It looks like this:

Now there’s a reason this notation has survived this long — it’s good. It’s easy to read, and allows a musician to read and play a piece he’s never heard before.

But when you try to write music, you find that the notation is actually quite cumbersome to write. The notes are positioned on groups of 5 lines, so you’d better either have sheets of these lines printed, or be prepared to tediously draw these lines with a ruler. The timing of notes is very precise, so if you slightly exceed the allowed time for a bar, sorry, your notation is not valid anymore.

Principles of Shorthand Notation

To solve these frustrations, I created an alternate system of recording music, with the primary goal of being easy to write. It’s possible to jot down a melody in 30 seconds, with just a pencil and normal (not printed sheet) paper.

I do not claim my notation to be better than the standard notation. Rather, I achieve a different goal, sacrificing information for the ease of writing.

Standard notation is good for recording a song so that a musician can play it without having heard it before.

My notation is good for reminding a musician how to play a song he has heard before.

A common use case would be reminding yourself the notes of a song you’re playing, or accompanying a recording of the song. In a way, its purpose is similar to that of guitar tablature.

Here’s my justification for doing this. Most people can produce rhythm intuitively — that is, after hearing a passage a few times, he can clap back the rhythm. It’s much more difficult to find the correct notes after hearing the passage — I stumble upon it by trial and error.

So if you write down the notes but leave out the rhythm, it would often be enough information to play the song.

The tradeoff should become clear if you compare the same passage written side by side (from Bach’s Minuet in G Major):

Rules of Writing Shorthand Notation

Start by writing the notes in a line, and separate bars with a vertical | line. Indicate the key signature at the beginning of the page, if needed. Feel free to liberally clump notes together or space them apart based on rhythm.

Next is the rule for jumps. When the melody goes upwards by a perfect fourth or more (like from C->F), write the jumped note on an elevated line.

Remain on the elevated line as long as the melody is still increasing or stays the same. But as soon as the melody descends, immediately drop back down to the neutral line.

Here’s an example:

As long as the melody consists of small intervals (like C->E->C), we stay on the neutral line. Only when the jump is large (C->F) do we go to the elevated line.

Typically in music, a large jump in one direction is followed by a small step backwards. This means that we spend most of our time on the neutral line. It’s very rare for a melody to have multiple jumps in the same direction.

Here’s another example (Twinkle twinkle little star):

The melody does a large jump on the third note (C->G), so the third note (G) is on the elevated line. On the seventh note, the melody descends one note from A->G, so we immediately drop back to the neutral line. It does not matter that the same G was on the elevated line before.

You do not always have to start on the neutral line. It might be useful to start on an elevated or depressed line. Here’s an example (Harry Potter):

Reasoning behind the Jump Rule

You might be wondering, why make this jump rule so complicated? Why have a jump rule at all?

Well, we need some way of indicating octaves. Otherwise, a interval like C->F would be ambiguous: are we going up a perfect fourth, or going down a perfect fifth?

On the other hand, if we decreased the jump threshold, say a major third (C->E) is a jump, then the melody would be littered with jumps up and down, which would be a nightmare to handle. Setting the threshold to the perfect fourth is a good balance.

The complexities of the jump rule ensures that when you’re shifting upwards, the melody is actually going upwards. It would be confusing to the reader if there was a situation where we return from the elevated line down to the neutral line, while the melody is going upwards!

Another distinct alternative to the jump rule is to divide all the notes into distinct octaves: for instance, put any notes between C4 (middle C) and C5 on the neutral line, everything between C5 and C6 on the elevated line, and so on. I experimented with this, but found it very awkward when the melody straddles on the boundary between two octaves.

And that’s how the jump rule was created. So please experiment with this system, see if you like it!

Algorithmic Counterpoint: How I Wrote a Computer Program to Generate Music at Mathcamp 2011

Hello all; I have just returned from Mathcamp, a five week math program for high school students. I’m not going to write in detail about life at Mathcamp, other than that it was amazing in every way.

This year at Mathcamp, I chose to do a computer programming project to ‘teach’ a computer to write ‘music’. The word ‘music’ is in quotes because what it’s expected to generate is more or less random and nothing like actual music by Bach or Mozart; on the other hand, the ‘music’ generated resembles actual music closely enough that it can still be called music, not a random string of notes. Anyways, here’s a short sample of what it can generate:

What can be considered ‘music’ in some sense here is really just a set of rules backed by a random number generator. Hence what it does is really the following:

  1. Generate a main melody using a bunch of rules: ie, what notes are allowed to go with each other, what rhythms are allowed to go with what notes, etc.
  2. Using the main melody as a sort of a baseline, generate one or more counterpoint melodies: a counterpoint melody is constrained by the same rules that the main melody must follow, but also has to harmonize with the existing melody, thus being constrained by another set of rules.
  3. Once all the notes are generated, the program simply uses some library — in our case JFugue — to play the notes through the speakers.

The bulk of our time was spent implementing, tweaking, and debugging the rules and the various other substeps in the above process.

The source code, mostly written by myself over about two weeks, is available as a Google Code project.