Here’s an easy problem.

You live in a house with 4 people. For simplicity, I will call them Andrei, Bai, Darin, and Young. One person pays for electricity, another person pays for gas, another person pays for water, and the last person pays for internet. However, the utilities cost different amounts, and it is agreed that the total cost should be split equally.

It has come to the time to wrap up the bills. After tallying up the receipts, you find that Andrei has paid $650, Bai has paid $240, Darin has paid $190, and Young has paid $120. What transfers do you make to distribute the costs fairly?

Well that’s easy. Add up all the numbers and you find that the group paid $1200 in total. A quarter of that is $300 — that’s the amount each person should pay in the end. If you’ve already paid $240, then the difference, $60, is the amount you have to pay to compensate.

To see this even more clearly, let us define **balance** as the difference between what you’re supposed to pay and what you actually paid. From now on, I will use a *negative* balance to mean you paid more than you supposed to and you are *owed* money; a positive balance means you owe money to others.

In this case, it’s obvious how to balance the bills. Since Andrei is the only person with a negative balance, everyone simply transfers the correct sum of money to Andrei, problem solved.

### But in general…

Being a computer science major, this left me wondering: what if I lived with 20 people? And what if, throughout the term, we lend each other money, so that multiple people have a negative balance, and multiple people have a positive balance? How do we solve this problem then?

For simplicity, from now on I will assume the preliminary calculations have been done, and we will work solely with the *balance* column. I will also assume that all values are integers.

One immediate observation is the balances always add up to 0. So given a list of integers than add up to 0, how do we find an *efficient* set of transfers to balance the bill?

What do we mean by *efficient*? Well, let’s explore several possibilities.

### Roommate Problem, version 1

Given a list of balances that add up to 0,

find the smallest number of transfersto balance the bill.

This seems at first glance to be the criterion we’re looking for. Writing cheques is a hassle, so we don’t want to write more than what is absolutely necessary.

But if you think about it, there’s a really cheap way to solve this problem:

Sort the list. Starting from the highest number, give all your money to the second highest number, repeat n-1 times.

Somehow this doesn’t feel very satisfying. If there are a lot of people, the people in the middle are going to be handling enormous amounts of money. Let’s try again.

### Roommate Problem, version 2

Given a list of balances that add up to 0,

minimize the total money transferredto balance the bill.

Perhaps what we really want is to minimize the money transferred? Maybe the bank charges $0.01 for each $1 you transfer?

Unfortunately, this problem can also be solved in a cheap way:

We don’t care how many transfers we make, so let’s just transfer $1 at a time! As long as we always transfer from positive to negative, it doesn’t matter how we do it, we’re always going to transfer a fixed amount of money. Let’s try again.

### Roommate Problem, version 3

Given a list of balances that add up to 0,

find the smallest set of transfersto balance the bill, with the limitation thattransfers are only allowed from a positive to a negative balance.

This captures our intuition that a person should either be transferring money or receiving money, not both.

Version 3 doesn’t fall immediately to a cheap trick like its two predecessors. Instances of this problem can get pretty tricky at times — here are some examples of some optimal solutions:

I couldn’t come up with an efficient algorithm to solve this problem. The best I could come up with was a greedy algorithm:

Assume the input is [-8,-4,5,7]. On each step, look for the number with the least absolute value (-4). Without loss of generality, assume this number is negative. Then ‘zero’ this number by cancelling it with the smallest number on the other side — so transfer $4 from 5 to 4, giving us [-8,1,7]. Repeat this until all numbers are zero.

How bad is this algorithm? Let’s say there are *M* negative numbers and *N* positive numbers. Then this algorithm requires at most* M+N-1* transfers, since each step zeroes at least one number, and the last step zeroes two numbers.

The optimal solution takes at least *max(M,N)* transfers. This proves that my greedy algorithm never takes more than 2 times the optimal number of transfers. Not too bad, but not great either.

Unable to progress any further, I asked around in the TopCoder forums. Surprisingly, I got an answer that hinted the problem was impossible to solve efficiently — it is NP-Complete!

### NP-Complete by Reduction from SUBSET-SUM

To prove a problem can be solved efficiently, you simply describe an algorithm that solves the problem, then prove this algorithm is efficient. But how do you prove a problem *cannot* be solved efficiently?

There are certain problems in computer science that are known to be hard: one of them is the Subset Sum problem. Given a set of positive integers and a positive integer *N*, is it possible to find a subset that sums to exactly *N*? Return YES if this is possible, or NO otherwise.

For example, say our set is *{3,5,7,8,11}*. Can we make 16? The answer is YES, because 5+11=16. Can we make 17? The answer is NO — if you check all the possibilities, you discover that no subset sums to exactly 17.

We can leverage the fact that the Subset Sum problem is hard using a proof by contradiction. Assume that there exists some efficient algorithm to solve the Roommate problem. In the diagram, I symbolize it with a black box.

Assume there is also a *converter* routine: an easy way to convert an input for the Subset Sum problem into an input for the Roommate problem. I’ll get to the details of this converter shortly; right now, assume it exists.

Then combining the Roommate solver with the converter, we have created a Subset Sum solver! If the Roommate solver is efficient, then this Subset Sum solver is also efficient. But we know that no efficient Subset Sum solver exists. Ergo, no efficient Roommate solver exists either.

The only missing piece is to reduce an instance of the Subset Sum problem to an input to the Roommate problem.

Here’s how. For each number in your set, create a roommate with that number as a positive balance. Then create a roommate with a balance of *-N* (the number you’re trying to sum up to). Then create one final roommate with the exact balance so that all the numbers sum to 0.

Here’s the input for *{3,5,7,8,11}* and *N=16*:

There are 5 numbers in the set, and the Roommate solver finds a solution requiring 5 transfers.

By contrast, here’s the input for *{3,5,7,8,11}* and *N=17*:

The Roommate solver can’t do better than 6 transfers.

So to solve the Subset Sum problem, plug it into the Roommate solver and see how many transfers it outputs. If it outputs exactly 1 transfer for every element in your set, then output YES. Otherwise, if there are more transfers than elements in your set, output NO.

This proves that the Roommate problem is as least as hard as Subset Sum, so it’s NP-Complete.

### Research in Existing Literature and Application to Biology

While researching for this blog post, I came upon this research paper titled* “On the Minimum Common Integer Partition Problem” *published in 2006 by Xin Cheng, Lan Liu, Zheng Liu, and Tao Jiang.

They investigate a problem they call **Minimum Common Integer Partition (MCIP)**. Given two lists of integers, say [4,8] and [5,7], find the smallest common partition — in this case, [3,4,5].

Compare this to the Roommate problem with input [-4,-8,5,7], and it’s clear that the Roommate problem is identical to 2-MCIP. (The 2 just means we’re finding the smallest partition between 2 lists, the paper also investigates finding the smallest partition between more than 2 lists).

Skimming through this paper, it derives an algorithm similar to my greedy algorithm which approximates the problem by a factor of 2. Using more complicated techniques, it manages to produce an algorithm with a 5/4 approximation.

Doing a bit more searching, it turns out that a more recent paper by David Woodruff reduces the approximation ratio for 2-MCIP down to 1.228; an even better paper reduces it down to 1.125 using network flow techniques. At this point, I think I’m way too sidetracked from the original problem, so I didn’t investigate the details.

What surprised me more was that this research was motivated not by roommates sharing utilities, but by biologists studying genome sequences! Biology is not my area of expertise, so I won’t comment further on that. But I’ll leave you these slides (taken from a presentation by the above-mentioned David Woodruff):

So in short, we can’t solve the Roommate problem perfectly, but with cutting-edge algorithms, we can guarantee ourselves to be off by no more than 12.5%!