## Visualizing Quaternions with Unity

November 24, 2014

How do you model the position and orientation of an airplane?

Position is easy, just represent it with a point in 3D space. But how do you specify its orientation — which direction it’s pointing?

At first glance, it seems a vector will do. After all, a vector points in some direction, right? If the plane is pointing east, represent its orientation by a unit vector pointing east.

Unfortunately, we quickly run into trouble when we try to roll. If we’re facing east, and we roll 90 degrees, we’re still facing east. Clearly we’re missing something.

### Euler Angles

When real pilots talk about their orientation, they talk about roll, yaw, pitch. Pitch is going up or down, yaw is going left or right, roll is, well, roll.

Any change in orientation can be described by some combination of roll, yaw, pitch. This is the basis for Euler Angles. We use three angles to represent the airplane’s orientation.

This is all fine and dandy if we want to represent the orientation of a static object in space. But when we try to adjust our orientation, we start to run into problems.

You’re thinking, this should be simple! When we turn left or right, we just increment the yaw variable, right? Yes, it seems to work, at least initially. You can turn left and right, up and down, and roll around.

Implement it in Unity and play around a bit, however, and you begin to notice that things don’t quite behave the way you expect.

In this animation, I’m holding down the right button:

The plane does rotate to the right, but it’s not rotating relative to itself. Instead it’s rotating around some invisible y-axis. If it was rotating relative to itself, the green arrow shouldn’t be moving.

The problem becomes more and more severe when the pitch of the plane becomes higher and higher. The worst case is when the airplane is pointing straight up: then roll and yaw become the same thing! This is called gimbal lock: we have lost a degree of freedom and we can only rotate in 2 dimensions! Definitely not something desirable if we’re controlling a plane or spaceship.

It turns out that no matter what we do, we will suffer from some form of gimbal lock. As long as we use Euler Angles, there is one direction where if we turn too far, everything starts to screw up.

### Practical Introduction to Quaternions

All is not lost, however. There is a way to represent orientation that represents all axes equally and does not suffer from gimbal lock. This mythical structure is called the quaternion. Unlike Euler Angles which describe your orientation relative to a fixed set of axes, quaternions do not rely on any fixed axis.

The drawback is that quaternions are unintuitive to understand for humans. There is no way to “look” at a quaternion and be able to visualize what rotation it represents. Fortunately for us, it’s not that difficult to make use of quaternions, even if we can’t visualize quaternions.

There is a lot of theory behind how quaternions work, but in this article, I will gloss over the theory and give a quick primer to quaternions, just the most common facts you need to use them. At the same time, I will implement the operations I describe in C#, so I can integrate them with Unity. If you don’t know C#, you can freely ignore the code.

### Definition

A quaternion is an ordered pair of 4 real numbers (w,x,y,z). We write this as

$w+xi+yj+zk$

The letters i,j,k are not variables. Rather, they are independent axes. If you like, you can think of the quaternions as a 4 dimensional vector space.

The defining property of quaternions is:

$i^2 = j^2 = k^2 = ijk = -1$

Play around with it a bit and you can derive 6 more identites:

$ij = k$

$jk = i$

$ki = j$

$ji = -k$

$kj = -i$

$ik = -j$

If you’ve worked with complex numbers, this should seem familiar. Instead of 2 parts of a complex number (the real and imaginary parts), we have 4 parts for a quaternion.

The similarity doesn’t end here. Multiplying complex numbers represents a rotation in 2 dimensions. Similarly, multiplying by a quaternion represents a rotation in 3D.

One curious thing to note: we have $ij=k$ and $ji=-k$. We switched around the terms and the product changed. This means that multiplying quaternions is kind of like multiplying matrices — the order matters. So multiplication is not commutative.

Here’s a framework for a quaternion in C#:

public class Quat{
// Represents w + xi + yj + zk
public float w, x, y, z;
public Quat(float w, float x, float y, float z){
this.w = w;
this.x = x;
this.y = y;
this.z = z;
}
}


### Normalizing Quaternions

The norm of a quaternion is

$N(\mathbf{q}) = \sqrt{w^2+x^2+y^2+z^2}$

When we use quaternions to represent rotations, we typically want unit quaternions: quaternions with norm 1. This is straightforward: to normalize a quaternion, divide each component by the norm.

In C#:

public float Norm(){
return Mathf.Sqrt (w * w + x * x + y * y + z * z);
}

public Quat Normalize(){
float m = Norm ();
return new Quat (w / m, x / m, y / m, z / m);
}


### Multiplying Quaternions

Multiplying is simple, just a little tedious. If we have two quaternions:

$(w_1 + x_1i + y_1j + z_1k) (w_2+x_2i+y_2j+z_2k)$

Then their product is this ugly mess:

$\begin{array}{l} w_1w_2-x_1x_2-y_1y_2-z_1z_2 \\ + (w_1x_2+x_1w_2+y_1z_2-z_1y_2)i \\ + (w_1y_2+y_1w_2-x_1z_2+z_1x_2) j \\ + (w_1z_2+z_1w_2+x_1y_2-y_1x_2) k \end{array}$

In C#:

// Returns a*b
public static Quat Multiply(Quat a, Quat b){
float w = a.w * b.w - a.x * b.x - a.y * b.y - a.z * b.z;
float x = a.w * b.x + a.x * b.w + a.y * b.z - a.z * b.y;
float y = a.w * b.y + a.y * b.w - a.x * b.z + a.z * b.x;
float z = a.w * b.z + a.z * b.w + a.x * b.y - a.y * b.x;
return new Quat (w,x,y,z).Normalize();
}


Since multiplication is not commutative, I made this function static to avoid confusing left and right multiplication. Also, I normalize the product so that floating point errors don’t accumulate.

### Constructing Rotation Quaternions

Every rotation operation can be written as a rotation of some angle, $\theta$, around some vector $(u_x, u_y, u_z)$:

The following formula gives a quaternion that represents this rotation:

$\mathbf{q} = \cos \frac{\theta}{2} + (u_x i + u_y j + u_z k) \sin \frac{\theta}{2}$

For our purposes, $\theta$ is a very small number, say 0.01, and we use one of the three basis vectors to rotate around. For example, if we are rotating around (1,0,0) then our quaternion is

$\cos \frac{0.01}{2} + \sin \frac{0.01}{2}i$

That’s it: given any quaternion, multiplying on the left by our quaternion rotates it slightly around the x axis.

In C#, our code might look like this:

Quat qx = new Quat (Mathf.Cos (0.01 / 2), 0, 0, Mathf.Sin (0.01 / 2));
Quat qy = new Quat (Mathf.Cos (0.01 / 2), 0, Mathf.Sin (0.01 / 2), 0);
Quat qz = new Quat (Mathf.Cos (0.01 / 2), Mathf.Sin (0.01 / 2), 0, 0);


### Putting it together

That’s all we need to do interesting things with quaternions. Let’s combine everything we have. Here’s our quaternion class thus far:

public class Quat{
// Represents w + xi + yj + zk
public float w, x, y, z;
public Quat(float w, float x, float y, float z){
this.w = w;
this.x = x;
this.y = y;
this.z = z;
}

public float Norm(){
return Mathf.Sqrt (w * w + x * x + y * y + z * z);
}

public Quat Normalize(){
float m = Norm ();
return new Quat (w / m, x / m, y / m, z / m);
}

// Returns a*b
public static Quat Multiply(Quat a, Quat b){
float w = a.w * b.w - a.x * b.x - a.y * b.y - a.z * b.z;
float x = a.w * b.x + a.x * b.w + a.y * b.z - a.z * b.y;
float y = a.w * b.y + a.y * b.w - a.x * b.z + a.z * b.x;
float z = a.w * b.z + a.z * b.w + a.x * b.y - a.y * b.x;
return new Quat (w,x,y,z).Normalize();
}

public Quaternion ToUnityQuaternion(){
return new Quaternion (w, x, y, z);
}
}


Now we just need to read the input, perform our calculations, and output the rotation quaternion to Unity:

public class Airplane : MonoBehaviour {
public GameObject airplane;
public Quat quat = new Quat (0, 0, 0, -1);
public float speed = 0.01f;

void FixedUpdate(){
float inputX = Input.GetAxis("UpDown");
float inputY = Input.GetAxis("LeftRight");
float inputZ = Input.GetAxis("Roll");

Quat qx = new Quat (Mathf.Cos (speed * inputX / 2), 0, 0, Mathf.Sin (speed * inputX / 2));
Quat qy = new Quat (Mathf.Cos (speed * inputY / 2), 0, Mathf.Sin (speed * inputY / 2), 0);
Quat qz = new Quat (Mathf.Cos (speed * inputZ / 2), Mathf.Sin (speed * inputZ / 2), 0, 0);

quat = Quat.Multiply (qx, quat);
quat = Quat.Multiply (qy, quat);
quat = Quat.Multiply (qz, quat);

airplane.transform.rotation = quat.ToUnityQuaternion ();
}
}


In Unity, the input is not given to us as a single true/false value, but a float between -1 and 1. So holding right increases the LeftRight input gradually until it reaches 1, avoiding a sudden jump in movement.

What’s ToUnityQuaternion? Well, it turns out that Unity already has a Quaternion class that does everything here and much more, so all this could have literally been implemented in one line if we wanted.

Anyways, let’s see the result.

As you can see, holding right turns the plane relative to itself now, and the green arrow stays still. Hooray!

## Beginner’s comparison of Computer Algebra Systems (Mathematica / Maxima / Maple)

August 11, 2014

I’ve never been very good at doing manual computations, and whenever I need to do a tedious computation for an assignment, I like to automate it by writing a computer program. Usually I implemented an ad-hoc solution using Haskell, either using a simple library or rolling my own implementation if the library didn’t have it. But I found this solution to be unsatisfactory: my Haskell programs worked with integers and floating numbers and I couldn’t easily generalize it to work with symbolic expressions. So I looked to learn a CAS (computer algebra system), so in the future I won’t have to hack together buggy code for common math operations.

I have no experience with symbolic computing, so it wasn’t clear to me where to begin. To start off, there are many different competing computer algebra systems, all incompatible with each other, and it’s far from clear which one is best for my needs. I began to experiment with several systems, but after a few days I still couldn’t decide which one was the winner.

I narrowed it down to 3 platforms. Here’s my setup (all running on Windows 7):

• Mathematica 8.0
• Maxima 5.32 with wxMaxima 13.04
• Maple 18.00

So I came up with a trial — I had a short (but nontrivial) problem representative of the type of problem I’d be looking at, and I would try to solve it in all 3 languages, to determine which one was easiest to work with.

### The Problem

This problem came up as a part of a recent linear algebra assignment.

Let the field be $\mathbb{Z}_5$ (so all operations are taken modulo 5). Find all 2×2 matrices $P$ such that

$P^T \left( \begin{array}{cc} 2 & 0 \\ 0 & 3 \end{array} \right) P = I$

We can break this problem into several steps:

• Enumerate all lists of length 4 of values between 0 to 4, that is, [[0,0,0,0],[0,0,0,1],…,[4,4,4,4]]. We will probably do this with a cartesian product or list comprehension.
• Figure out how to convert a list into a 2×2 matrix form that the system can perform matrix operations on. For example, [1,2,3,4] might become matrix([1,2],[3,4])
• Figure out how to do control flow, either by looping over a list (procedural) or with a map and filter (functional)
• Finally, multiply the matrices modulo 5 and check if it equals the identity matrix, and output.

This problem encompasses a lot of the challenges I have with CAS software, that is, utilize mathematical functions (in this case, we only use matrix multiplication and transpose), yet at the same time express a nontrivial control flow. There are 5^4=625 matrices to check, so performance is not a concern; I am focusing on ease of use.

For reference, here is the answer to this problem:

These are the 8 matrices that satisfy the desired property.

I have no prior experience in programming in any of the 3 languages, and I will try to solve this problem with the most straightforward way possible with each of the languages. I realize that my solutions will probably be redundant and inefficient because of my inexperience, but it will balance out in the end because I’m equally inexperienced in all of the languages.

### Mathematica

I started with Mathematica, a proprietary system by Wolfram Research and the engine behind Wolfram Alpha. Mathematica is probably the most powerful out of the three, with capabilities with working with data well beyond what I’d expect from a CAS.

What I found most jarring about Mathematica is its syntax. I’ve worked with multiple procedural and functional languages before, and there are certain things that Mathematica simply does differently from everybody else. Here are a few I ran across:

• To use a pure function (equivalent of a lambda expression), you refer to the argument as #, and the function must end with the & character
• The preferred shorthand for Map is /@ (although you can write the longhand Map)
• To create a cartesian product of a list with itself n times, the function is called Tuples, which I found pretty counterintuitive

Initially I wanted to convert my flat list into a nested list by pattern matching Haskell style, ie f [a,b,c,d] = [[a,b],[c,d]], but I wasn’t sure how to do that, or if the language supports pattern matching on lists. However I ran across Partition[xs,2] which does the job, so I went with that.

Despite the language oddities, the functions are very well documented, so I was able to complete the task fairly quickly. The UI is fairly streamlined and intuitive, so I’m happy with that. I still can’t wrap my head around the syntax — I would like it more if it behaved more like traditional languages — but I suppose I’ll get the hang of it after a while.

Here’s the program I came up with:

SearchSpaceLists := Tuples[Range[0, 4], 4]
SearchSpaceMatrices :=
Map[Function[xs, Partition[xs, 2]], SearchSpaceLists]
Middle := {{2, 0}, {0, 3}}
FilteredMatrices :=
Select[SearchSpaceMatrices,
Mod[Transpose[#].Middle.#, 5] == IdentityMatrix[2] &]
MatrixForm[#] & /@ FilteredMatrices


### Maxima

Maxima is a lightweight, open source alternative to Mathematica; I’ve had friends recommend it as being small and easy to use.

The syntax for Maxima is more natural, with things like lists and loops and lambda functions working more or less the way I expect. However, whenever I tried to do something with a function that isn’t the most common use case, I found the documentation lacking and often ended up combing through old forum posts.

Initially I tried to generate a list with a cartesian product like my Mathematica version, but I couldn’t figure out how to do that, eventually I gave up and used 4 nested for loops because that was better documented.

Another thing I had difficulty with was transforming a nested list into a matrix using the matrix command. Normally you would create a matrix with matrix([1,2],[3,4]), so by passing in two parameters. The function doesn’t handle passing in matrix([[1,2],[3,4]]), so to get around that you need to invoke a macro: funmake(‘matrix,[[1,2],[3,4]]).

Overall I found that the lack of documentation made the system frustrating to work with. I would however use it for simpler computations that fall under the common use cases — these are usually intuitive in Maxima.

Here’s the program I came up with:

Middle:matrix([2,0],[0,3]);
Ident:identfor(Middle);
for a:0 thru 4 do
for b:0 thru 4 do
for c:0 thru 4 do
for d:0 thru 4 do
(P:funmake('matrix,[[a,b],[c,d]]),
P2:transpose(P).Middle.P,
if matrixmap(lambda([x],mod(x,5)),P2) = Ident then
print(P));


Shortly after writing this I realized I didn’t actually need the funmake macro, since there’s no need to generate a nested list in the first place, I could simply do matrix([a,b],[c,d]). Oh well, the point still stands.

### Maple

Maple is a proprietary system developed by Maplesoft, a company based in Waterloo. Being a Waterloo student, I’ve had some contact with Maple: professors used it for demonstrations, some classes used it for grading. Hence I felt compelled to give Maple a shot.

At first I was pleasantly surprised that matrix multiplication in a finite field was easy — the code to calculate A*B in $\mathbb{Z}_5$ is simply A.B mod 5. But everything went downhill after that.

The UI for Maple feels very clunky. Some problems I encountered:

• It’s not clear how to halt a computation that’s in a an infinite loop. It doesn’t seem to be possible within the UI, and the documentation suggests it’s not possible in all cases (it recommends manually terminating the process). Of course, this loses all unsaved work, so I quickly learned to save before every computation.
• I can’t figure out how to delete a cell without googling it. It turns out you have to select your cell and a portion of the previous cell, then hit Del.
• Copy and pasting doesn’t work as expected. When I tried to copy code written inside Maple to a text file, all the internal formatting and syntax highlighting information came with it.
• Not an UI issue, but error reporting is poor. For example, the = operator works for integers, but when applied to matrices, it silently returns false. You have to use Equals(a,b) to compare matrices (this is kind of like java).

In the end, I managed to complete the task but the poor UI made the whole process fairly unpleasant. I don’t really see myself using Maple in the future; if I had to, I would try the command line.

Here’s the program I came up with:

with(LinearAlgebra):
with(combinat, cartprod):
L := [seq(0..4)]:
T := cartprod([L, L, L, L]):
Middle := <2,0;0,3>:
while not T[finished] do
pre_matrix := T[nextvalue]();
matr := Matrix(2,2,pre_matrix);
if Equal(Transpose(matr).Middle.matr mod 5, IdentityMatrix(2)) then
print(matr);
end if
end do:


### Conclusion

After the brief trial, there is still no clear winner, but I have enough data to form some personal opinions:

• Mathematica is powerful and complete, but has a quirky syntax. It has the most potential — definitely the one I would go with if I were to invest more time into learning a CAS.
• Maxima is lightweight and fairly straightfoward, but because of lack of documentation, it might not be the best tool to do complicated things with. I would keep it for simpler calculations though.
• Maple may or may not be powerful compared to the other two, I don’t know enough to compare it. But its UI is clearly worse and it would take a lot to compensate for that.

## A retrospective on the BALL programming language

August 7, 2014

One of the courses I’m taking this term is CS241 (Foundations of Sequential Programs). This course begins with MIPS assembly, then moves on to lexing and parsing, and eventually cumulates in writing a compiler for a subset of C down to MIPS assembly.

As I wrote my compiler, tediously coding one typechecking rule after another, my mind wandered. There used to be a time when things were simpler, the time when I tried to create my own programming language.

I was 14 back then, still in middle school, having just learned how to program in Java. Rather than going outside and kicking a ball like other kids my age, I, being a true nerd, stayed at home and tinkered with programming languages. The name of the language was BALL, short for “BaiSoft All-purpose List-oriented Language”. It was my first ever “major” programming project.

As you can imagine, my attempt was not quite the next GCC-killer. I knew nothing about compilers, none of the theory of using finite state automatons to scan input into tokens and so on. I used the little I did know, but in the end I was pleased with my efforts.

### The BALL Language

One of the first oddities you notice is the GUI. Yes, a graphical user interface — I decided that running programs from the command line wasn’t very cool. To run a program, you would open ball.jar and paste your program into a textbox, then hit the Run button.

When you hit the Run button, your output would appear on a console window which conveniently pops up on the right:

The language itself was essentially a glorified form of assembly. A program consisted of a list of “instructions”, each of which was one line. My language supported two types of variables: string and integer. The only form of control flow was an unconditional jump and a conditional jump.

You are allowed 200 string variables and 300 integer variables. Whenever you use a variable, you have to tell the interpreter what type it is: you write #x if x is a number and &x if x is a string.

String literals were not enclosed by double quotations, rather, they are placed directly into the code. If you want a space character, you write *s.

Some other oddities (questionable design decisions?):

• A keyword to redefine other keywords. Done primarily to obfuscate code and confuse readers.
• A keyword to delay the program by n milliseconds. I still remember debugging a bug where the whole UI became unresponsive when a delay was used (you aren’t allowed to sleep on the UI thread in Java). That was my first taste of multithreaded programming.
• A keyword to emit a beep. I have no idea.

A typical program looks like this:

new number rep 0
write Input *s A *s Number. *n
input #rep
new number counter 0
hereis repeat
set #counter #counter + 1
write #counter *n
delay 30
if #counter < #rep repeat


This program asks the user for a number, then counts up to that number.

### Examples of BALL

Here is the original manual for BALL, written in 2008. It contains a number of example programs, here are a few:

Prime number generator:

Double buffered animation:

Surprisingly the original website itself is still up. I wonder how long it will remain so.

### Verdict

Just from running the executable, it seems that the program, although quirky, mostly works. Only when digging through the old source code do I realize what a mess the whole thing was.

The string syntax for example. The first step in decoding an instruction was to tokenize it by the space character, so print “Hello World” would tokenize to [print,”Hello,World”]. Of course, this loses all the whitespace characters in the string literal. My solution? Use *s for space, so the tokenized list is [print,Hello,*s,World] and everything works out.

It’s often said that a programmer should always hate his old code, as that’s a sign that he’s improving. I still haven’t mastered programming, but I’ve definitely improved since I started back in eighth grade.

## Waterloo’s Jobmine process and my first co-op internship

May 5, 2014

I just finished my first internship — since it’s my first ever “real” full-time job, I feel it’s a rite of passage of some sort.

The internship, or co-op work term, lasted 4 months from January to April. My position was titled “Software Developer”, and the company I worked for was TutorJam, a small educational startup in Kitchener.

### The Jobmine Process

Like most students at Waterloo, I found my job through Jobmine. The process was intimidating at first: the whole slew of resumes, interviews, jobmine cycles, ranking systems, etc, were a lot to take in. But as I brushed up my resume and tentatively submitted a few cover letters, I began to relax a little.

In the end, I applied to 25 jobs (the limit is 50 applications). Most of these were in the Kitchener-Waterloo area, mainly because I leased a house here and didn’t want to relocate. Out of these 25 positions, 5 of them were cancelled before the interview stage. Out of the 20 jobs remaining, I got interviewed for 10 of them.

The interviews came and went, and in the end, 4 of the 10 companies that interviewed me gave me an offer. So I had the good fortune to take my pick between 4 jobs, any one of which I’d be happy working for. I ended up simply picking the job that looked the most interesting.

### The Internship

During the 4 months, I worked on a site called YuJa. It’s an “online video collaboration platform”, but I like to describe it to my friends as “kind of like D2L but with lots of videos”. Here’s a picture of the login page of the website:

The team was very small — there were 2 co-op students and 2 full time developers, so essentially we had 4 programmers and 1 manager working on the entire project. As a result, I was entrusted with developing whole features by myself, both the frontend and backend — something rather unusual for a first time co-op.

The project is built with the standard HTML/CSS/Javascript/jQuery on the frontend, and used WildFly on the backend (basically a Java based server). When I started, I was proficient with the Java programming language, but had very little experience with web development (like HTML/CSS/JS). Initially the learning curve was quite steep, but I quickly picked up the skills I was missing.

In the first week, I fixed minor bugs and implemented small improvements, in order to “learn the ropes”. In the second week, I was assigned my first major feature. Essentially it allowed a professor to quickly send a group message to everyone in a class, and the students would receive it by email and SMS. Before the end of the month, my feature was complete.

Here’s a picture of my office (my computer is on the right, the guy on the left is Samson, another co-op student):

There were only the two of us physically present in this room in Kitchener — the company is spread out between several cities across North America. Thus all of our communications were done remotely, via Google talk. Another consequence of this was that in order to keep everyone in the same time zone, we were required to work from noon to 8pm.

### Conclusion

All in all, my first internship was a positive experience, as I learned a lot and worked with very smart people. I learned how to work my way around a large codebase, also got a taste of what a startup is like. I suppose the only downside was that there was almost no social activity.

Hopefully I haven’t violated any company NDA by writing this post.

This sums up my co-op experience. Starting this week, I will be doing another 4 month study term (2B Computer Science) until August.

## Splitting utility costs between roommates is NP-Complete

April 5, 2014

Here’s an easy problem.

You live in a house with 4 people. For simplicity, I will call them Andrei, Bai, Darin, and Young. One person pays for electricity, another person pays for gas, another person pays for water, and the last person pays for internet. However, the utilities cost different amounts, and it is agreed that the total cost should be split equally.

It has come to the time to wrap up the bills. After tallying up the receipts, you find that Andrei has paid $650, Bai has paid$240, Darin has paid $190, and Young has paid$120. What transfers do you make to distribute the costs fairly?

Well that’s easy. Add up all the numbers and you find that the group paid $1200 in total. A quarter of that is$300 — that’s the amount each person should pay in the end. If you’ve already paid $240, then the difference,$60, is the amount you have to pay to compensate.

To see this even more clearly, let us define balance as the difference between what you’re supposed to pay and what you actually paid. From now on, I will use a negative balance to mean you paid more than you supposed to and you are owed money; a positive balance means you owe money to others.

In this case, it’s obvious how to balance the bills. Since Andrei is the only person with a negative balance, everyone simply transfers the correct sum of money to Andrei, problem solved.

### But in general…

Being a computer science major, this left me wondering: what if I lived with 20 people? And what if, throughout the term, we lend each other money, so that multiple people have a negative balance, and multiple people have a positive balance? How do we solve this problem then?

For simplicity, from now on I will assume the preliminary calculations have been done, and we will work solely with the balance column. I will also assume that all values are integers.

One immediate observation is the balances always add up to 0. So given a list of integers than add up to 0, how do we find an efficient set of transfers to balance the bill?

What do we mean by efficient? Well, let’s explore several possibilities.

### Roommate Problem, version 1

Given a list of balances that add up to 0, find the smallest number of transfers to balance the bill.

This seems at first glance to be the criterion we’re looking for. Writing cheques is a hassle, so we don’t want to write more than what is absolutely necessary.

But if you think about it, there’s a really cheap way to solve this problem:

Sort the list. Starting from the highest number, give all your money to the second highest number, repeat n-1 times.

Somehow this doesn’t feel very satisfying. If there are a lot of people, the people in the middle are going to be handling enormous amounts of money. Let’s try again.

### Roommate Problem, version 2

Given a list of balances that add up to 0, minimize the total money transferred to balance the bill.

Perhaps what we really want is to minimize the money transferred? Maybe the bank charges $0.01 for each$1 you transfer?

Unfortunately, this problem can also be solved in a cheap way:

We don’t care how many transfers we make, so let’s just transfer $1 at a time! As long as we always transfer from positive to negative, it doesn’t matter how we do it, we’re always going to transfer a fixed amount of money. Let’s try again. ### Roommate Problem, version 3 Given a list of balances that add up to 0, find the smallest set of transfers to balance the bill, with the limitation that transfers are only allowed from a positive to a negative balance. This captures our intuition that a person should either be transferring money or receiving money, not both. Version 3 doesn’t fall immediately to a cheap trick like its two predecessors. Instances of this problem can get pretty tricky at times — here are some examples of some optimal solutions: I couldn’t come up with an efficient algorithm to solve this problem. The best I could come up with was a greedy algorithm: Assume the input is [-8,-4,5,7]. On each step, look for the number with the least absolute value (-4). Without loss of generality, assume this number is negative. Then ‘zero’ this number by cancelling it with the smallest number on the other side — so transfer$4 from 5 to 4, giving us [-8,1,7]. Repeat this until all numbers are zero.

How bad is this algorithm? Let’s say there are M negative numbers and N positive numbers. Then this algorithm requires at most M+N-1 transfers, since each step zeroes at least one number, and the last step zeroes two numbers.

The optimal solution takes at least max(M,N) transfers. This proves that my greedy algorithm never takes more than 2 times the optimal number of transfers. Not too bad, but not great either.

Unable to progress any further, I asked around in the TopCoder forums. Surprisingly, I got an answer that hinted the problem was impossible to solve efficiently — it is NP-Complete!

### NP-Complete by Reduction from SUBSET-SUM

To prove a problem can be solved efficiently, you simply describe an algorithm that solves the problem, then prove this algorithm is efficient. But how do you prove a problem cannot be solved efficiently?

There are certain problems in computer science that are known to be hard: one of them is the Subset Sum problem. Given a set of positive integers and a positive integer N, is it possible to find a subset that sums to exactly N? Return YES if this is possible, or NO otherwise.

For example, say our set is {3,5,7,8,11}. Can we make 16? The answer is YES, because 5+11=16. Can we make 17? The answer is NO — if you check all the possibilities, you discover that no subset sums to exactly 17.

We can leverage the fact that the Subset Sum problem is hard using a proof by contradiction. Assume that there exists some efficient algorithm to solve the Roommate problem. In the diagram, I symbolize it with a black box.

Assume there is also a converter routine: an easy way to convert an input for the Subset Sum problem into an input for the Roommate problem. I’ll get to the details of this converter shortly; right now, assume it exists.

Then combining the Roommate solver with the converter, we have created a Subset Sum solver! If the Roommate solver is efficient, then this Subset Sum solver is also efficient. But we know that no efficient Subset Sum solver exists. Ergo, no efficient Roommate solver exists either.

The only missing piece is to reduce an instance of the Subset Sum problem to an input to the Roommate problem.

Here’s how. For each number in your set, create a roommate with that number as a positive balance. Then create a roommate with a balance of -N (the number you’re trying to sum up to). Then create one final roommate with the exact balance so that all the numbers sum to 0.

Here’s the input for {3,5,7,8,11} and N=16:

There are 5 numbers in the set, and the Roommate solver finds a solution requiring 5 transfers.

By contrast, here’s the input for {3,5,7,8,11} and N=17:

The Roommate solver can’t do better than 6 transfers.

So to solve the Subset Sum problem, plug it into the Roommate solver and see how many transfers it outputs. If it outputs exactly 1 transfer for every element in your set, then output YES. Otherwise, if there are more transfers than elements in your set, output NO.

This proves that the Roommate problem is as least as hard as Subset Sum, so it’s NP-Complete.

### Research in Existing Literature and Application to Biology

While researching for this blog post, I came upon this research paper titled “On the Minimum Common Integer Partition Problem” published in 2006 by Xin Cheng, Lan Liu, Zheng Liu, and Tao Jiang.

They investigate a problem they call Minimum Common Integer Partition (MCIP). Given two lists of integers, say [4,8] and [5,7], find the smallest common partition — in this case, [3,4,5].

Compare this to the Roommate problem with input [-4,-8,5,7], and it’s clear that the Roommate problem is identical to 2-MCIP. (The 2 just means we’re finding the smallest partition between 2 lists, the paper also investigates finding the smallest partition between more than 2 lists).

Skimming through this paper, it derives an algorithm similar to my greedy algorithm which approximates the problem by a factor of 2. Using more complicated techniques, it manages to produce an algorithm with a 5/4 approximation.

Doing a bit more searching, it turns out that a more recent paper by David Woodruff reduces the approximation ratio for 2-MCIP down to 1.228; an even better paper reduces it down to 1.125 using network flow techniques. At this point, I think I’m way too sidetracked from the original problem, so I didn’t investigate the details.

What surprised me more was that this research was motivated not by roommates sharing utilities, but by biologists studying genome sequences! Biology is not my area of expertise, so I won’t comment further on that. But I’ll leave you these slides (taken from a presentation by the above-mentioned David Woodruff):

So in short, we can’t solve the Roommate problem perfectly, but with cutting-edge algorithms, we can guarantee ourselves to be off by no more than 12.5%!

## A Simple Shorthand Musical Notation

March 23, 2014

Anyone who’s played piano, or any other musical instrument, would be familiar with the “standard” musical notation. It’s clear, unambiguous, accepted worldwide, and has been basically unchanged since Bach. It looks like this:

Now there’s a reason this notation has survived this long — it’s good. It’s easy to read, and allows a musician to read and play a piece he’s never heard before.

But when you try to write music, you find that the notation is actually quite cumbersome to write. The notes are positioned on groups of 5 lines, so you’d better either have sheets of these lines printed, or be prepared to tediously draw these lines with a ruler. The timing of notes is very precise, so if you slightly exceed the allowed time for a bar, sorry, your notation is not valid anymore.

### Principles of Shorthand Notation

To solve these frustrations, I created an alternate system of recording music, with the primary goal of being easy to write. It’s possible to jot down a melody in 30 seconds, with just a pencil and normal (not printed sheet) paper.

I do not claim my notation to be better than the standard notation. Rather, I achieve a different goal, sacrificing information for the ease of writing.

Standard notation is good for recording a song so that a musician can play it without having heard it before.

My notation is good for reminding a musician how to play a song he has heard before.

A common use case would be reminding yourself the notes of a song you’re playing, or accompanying a recording of the song. In a way, its purpose is similar to that of guitar tablature.

Here’s my justification for doing this. Most people can produce rhythm intuitively — that is, after hearing a passage a few times, he can clap back the rhythm. It’s much more difficult to find the correct notes after hearing the passage — I stumble upon it by trial and error.

So if you write down the notes but leave out the rhythm, it would often be enough information to play the song.

The tradeoff should become clear if you compare the same passage written side by side (from Bach’s Minuet in G Major):

### Rules of Writing Shorthand Notation

Start by writing the notes in a line, and separate bars with a vertical | line. Indicate the key signature at the beginning of the page, if needed. Feel free to liberally clump notes together or space them apart based on rhythm.

Next is the rule for jumps. When the melody goes upwards by a perfect fourth or more (like from C->F), write the jumped note on an elevated line.

Remain on the elevated line as long as the melody is still increasing or stays the same. But as soon as the melody descends, immediately drop back down to the neutral line.

Here’s an example:

As long as the melody consists of small intervals (like C->E->C), we stay on the neutral line. Only when the jump is large (C->F) do we go to the elevated line.

Typically in music, a large jump in one direction is followed by a small step backwards. This means that we spend most of our time on the neutral line. It’s very rare for a melody to have multiple jumps in the same direction.

Here’s another example (Twinkle twinkle little star):

The melody does a large jump on the third note (C->G), so the third note (G) is on the elevated line. On the seventh note, the melody descends one note from A->G, so we immediately drop back to the neutral line. It does not matter that the same G was on the elevated line before.

You do not always have to start on the neutral line. It might be useful to start on an elevated or depressed line. Here’s an example (Harry Potter):

### Reasoning behind the Jump Rule

You might be wondering, why make this jump rule so complicated? Why have a jump rule at all?

Well, we need some way of indicating octaves. Otherwise, a interval like C->F would be ambiguous: are we going up a perfect fourth, or going down a perfect fifth?

On the other hand, if we decreased the jump threshold, say a major third (C->E) is a jump, then the melody would be littered with jumps up and down, which would be a nightmare to handle. Setting the threshold to the perfect fourth is a good balance.

The complexities of the jump rule ensures that when you’re shifting upwards, the melody is actually going upwards. It would be confusing to the reader if there was a situation where we return from the elevated line down to the neutral line, while the melody is going upwards!

Another distinct alternative to the jump rule is to divide all the notes into distinct octaves: for instance, put any notes between C4 (middle C) and C5 on the neutral line, everything between C5 and C6 on the elevated line, and so on. I experimented with this, but found it very awkward when the melody straddles on the boundary between two octaves.

And that’s how the jump rule was created. So please experiment with this system, see if you like it!

## Simple experimentation with jQuery

December 31, 2013

This term, I got hired for a co-op internship at a small software company in Kitchener.

The job posting required primarily Java programming, but the company uses a combination of Java (for the back end) and Javascript (for the front end). I did not have much experience with Javascript and web programming, so they asked me to learn jQuery and Ajax, and a bunch of other things.

After a few days of playing with jQuery, this is what I came up with:

It’s a “Trivial Collatz Simulator”. The user types in a number, and the program simulates the Collatz procedure (with animations!) until we reach 1.

The program is written using jQuery. On each iteration, it uses Ajax to query a local server (written in PHP), to do the arithmetic and return the next number in the sequence. That’s about it.