Project Euler 280

March 6, 2010

Project Euler 280 is an interesting problem involving probability and combinatorics:

There is a 5×5 grid. An ant starts in the center square of the grid and walks randomly. Each step, the ant moves to an adjacent square (but not off the grid).

In each of the five bottom-most squares, there is a seed. When the ant reaches such a square, he picks up the seed (if he isn’t already carrying one).

When the ant reaches a square in the top-most row while carrying a seed, he drops off the seed (if there isn’t already a seed there).

This ‘game’ ends when all five seeds are in the five squares of the top-most row.

The problem asks for the expected (average) number of turns it would take to get from the initial to the ending position. It requires six digits of precision.

The Monto Carlo Approach

Perhaps the easiest way to tackle this problem is with a Monto Carlo simulation. This uses a computer to actually run the game many times, using random number generators.

This is my straight-forward Monto Carlo implementation in Java:

import java.util.*;

public class Main{
	public static void main(String[] args){
		Random r = new Random();
		int stepsum = 0;
		int tries = 0;
		while(true){
			stepsum += new Simulation(r).simulate();
			tries ++;
			System.out.println((double) stepsum / (double) tries);
		}
	}
}

class Simulation{

	static final int[] init = {
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		1,1,1,1,1
	};
	State state;
	Random rand;
	int steps;

	Simulation(Random rand){
		state = new State(init, 12, false);
		this.rand = rand;
		steps = 0;
	}

	int simulate(){
		while(!done())
			step();
		return steps;
	}

	boolean done(){
		int[] b = state.board;
		return b[0]==1 && b[1]==1 && b[2]==1 && b[3]==1 && b[4]==1;
	}

	boolean step(){
		steps ++;

		int antX = state.ant % 5;
		int antY = state.ant / 5;
		int dir;

		while(true){
			// 0:N 1:S 2:E 3:W
			dir = rand.nextInt(4);
			if(antY==0 && dir==0) continue;
			if(antY==4 && dir==1) continue;
			if(antX==4 && dir==2) continue;
			if(antX==0 && dir==3) continue;
			break;
		}

		switch(dir){
			case 0: antY--; break;
			case 1: antY++; break;
			case 2: antX++; break;
			case 3: antX--; break;
		}

		int oldAnt = state.ant;
		state.ant = 5*antY + antX;

		if(state.carrying){
			state.board[oldAnt]--;
			state.board[state.ant]++;
		}

		if(antY == 0 && state.board[state.ant] == 1 && state.carrying){
			// drop off
			state.carrying = false;
		}

		if(antY == 4 && state.board[state.ant] == 1 && !state.carrying){
			// pick up
			state.carrying = true;
		}

		return true;
	}
}

class State{
	// 25 board.
	int[] board;

	int ant;
	boolean carrying;

	State(int[] board, int ant, boolean carrying){
		this.board = board.clone();
		this.ant = ant;
		this.carrying = carrying;
	}

	State(State s){
		this(s.board, s.ant, s.carrying);
	}

	public boolean equals(Object o){
		State s = (State) o;
		return Arrays.equals(s.board, board) && s.ant == ant;
	}

	public int hashCode(){
		return Arrays.hashCode(board) + ant;
	}
}

Running this program will produce something close to the final result. But this will not nearly be accurate enough to actually solve the problem: running this for a few hours would only produce two or three digits of precision, while six is required.

It’s not exactly impossible, as on the forums Rodinio claims to have run a similar Monto Carlo simulation on a computer cluster, giving enough precision to start making guesses for the last digits. To get so many digits on a Monto Carlo simulation requires an astronomical amount of tries.

Obviously this is not the most efficient approach.

Introducing the Markov Chain

A more efficient solution can be implemented using Markov Chains.

A Markov Chain solves the problem of discrete random processes. What is a Markov Chain, and how is it used?

Let’s say we have a finite group of states, S = \{ s_1, s_2, \cdots , s_n \}. Make a matrix M, of size n * n.

On each step (or move) of the process, the probability of going from state s_i to s_j is M_{ij}.

The process starts on an initial state. From each state there is a certain probability of going to every other state.

Notice that in a Markov Chain, what happens next does not depend on what has happened before. The only data is the current state you’re on.

I’ll explain all this with a very simple example:

On a 1×3 game board, Bob (the happy face) starts at square 1. Each step, he randomly moves to an adjacent square with equal probability. The game ends when he reaches square 3.

Of course the only time he has a ‘choice’ in which square to move to is when he’s on square 2 (half-half chance of moving to 1 and 3). If he’s already on square 1, he has a 100% chance of moving to square 2, and if he’s on square 3, well, the game is over.

We can represent this game like this:

Another way of showing the above would be something like this:

This is the matrix, of course:

\left[ \begin{array}{ccc} 0 & 1 & 0 \\ \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & 0 & 1 \end{array} \right]

This type of matrix is called a Stochastic matrix. What’s special about this matrix is that each row of the matrix adds up to 1.

A state that can only return to itself is called an absorbing state. If in the matrix, M_{ii} = 1 for any value of i, then state i is an absorbing state.

Here, state 3 is an absorbing state.

The next step is very important: determining the time until absorption.

From our matrix, take out all of the absorbing rows, and all of the left columns to make it a square matrix:

It seems that by doing this we are losing information. But we’re actually not. We are treating each absorbing state as the same; and since each row originally added up to 1, the probability of going to the absorbing state is simply 1 minus the sum of the rest of the row.

This form is considered the canonical form of the stochastic matrix. Our matrix in canonical form looks like this:

\left[ \begin{array}{cc} 0 & 1 \\ \frac{1}{2} & 0 \end{array} \right]

Next we subtract it from the identity matrix.

\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] - \left[ \begin{array}{cc} 0 & 1 \\ \frac{1}{2} & 0 \end{array} \right] = \left[ \begin{array}{cc} 1 & -1 \\ -\frac{1}{2} & 1 \end{array} \right]

Next we invert this matrix and get:

\left[ \begin{array}{cc} 1 & -1 \\ -\frac{1}{2} & 1 \end{array} \right]^{-1} = \left[ \begin{array}{cc} 2 & 2 \\ 1 & 2 \end{array} \right]

The expected absorption time from state s_n is simply the sum of row n in the above matrix.

So because we started on state 1, the expected length of the game is 2+2 or 4.

All of this seems like magic, because I haven’t given any proof for this. If you really want to prove that this method works, you can get a textbook about Markov chains.

“Markov Chains” by J.R.Norris (Cambridge 1997) is a decent textbook if you want to know more about this. I would recommend this as a less dense introduction to Markov chains.

An attempt using Markov Chains

Now that we know some basics of Markov chains, we need to apply it to solve the problem.

If you attach a Set or something to the Monto Carlo simulation program, you would find that there are 10270 distinct states for this problem. If we count the five final states as the same state, then we have 10266 states.

This is exactly the same as what we’ve just done. Except instead of a 3*3 matrices, we’re now working with matrices with tens of thousands of rows and columns. Woo hoo.

You can probably see here that this is going to be a problem. A 10265*10265 (final state not counted) matrix contains over 100 million elements. Most of these elements will be zero. If we store all of this data in double types, that’s 800mb of memory used, just for the matrix.

It’s easy to underestimate the size of this problem. But there’s a few optimizations we can make.

We could save some time and space by subtracting from the identity matrix at the same time as filling up the matrix.

Inverting a 10265*10265 matrix is no small task either. Instead there’s a simple optimization to avoid inverting such a huge matrix:

Remember that we want to find the inverse of the matrix, and get the sum of all the elements in the first row. Let M be our matrix and M' be the inverse.

What’s important here is that finding the sum of all the rows of a matrix is the same as multiplying by a column of ones. This produces a one-columned matrix, each cell representing the sum of the corresponding row.

If C is the column of ones, and S is the sum matrix (what we want), then M' * C = S. We can also write that as M * S = C.

See, this way we don’t have to compute the inverse. We just have to solve the above equation for S.

To avoid implementing the standard matrix solve function, I’ll be using the JAMA library for matrix manipulation.

Finally, the source code for my implementation of what I just discussed:

import java.util.*;
import Jama.*;

public class Main{

	// For a state, maps to all possible states from it.
	// All next-states have the same probability.
	static Map<State,List<State>> states = new LinkedHashMap<State,List<State>>();

	public static void main(String[] args){

		// Keep the time.
		long startTime = System.currentTimeMillis();

		// Initialize the state map.
		addAllStates(State.INIT_STATE);

		// To construct the matrix, we need to map all the states to a position
		// in the matrix (an integer). Because our map is already ordered, we
		// use its position in the map as the position in the matrix. In order
		// to avoid doing a linear search through the keyset to find its position,
		// we cache its position in a map.

		Set<State> keySet = states.keySet();
		Map<State,Integer> positionMap = new HashMap<State,Integer>();

		// Set up position map.
		Iterator<State> iterator = keySet.iterator();
		int pos = 0;
		while(iterator.hasNext()){
			positionMap.put(iterator.next(), pos);
			pos++;
		}

		// Finally we can begin constructing our matrix.
		// This takes up about 900mb of memory.
		double[][] matrix = new double[states.size()][states.size()];

		// Fill up the matrix.
		Collection<List<State>> values = states.values();
		Iterator<List<State>> rowIterator = values.iterator();

		for(int row=0; row<matrix.length; row++){

			// Fill up one row.
			List<State> thisRow = rowIterator.next();

			// What to fill with?
			double fill = 1.0 / thisRow.size();

			// An optimization: why not do the subtraction step here
			// instead of subtracting from identity matrix?
			matrix[row][row] = 1;

			for(State st : thisRow){
				// The final state doesn't count.
				if(st.equals(State.FINAL_STATE)) continue;

				int col = positionMap.get(st);
				matrix[row][col] = -fill;
			}
		}

		// Finishing up.
		Matrix bigMatrix = new Matrix(matrix);
		Matrix onesColumn = new Matrix(states.size(), 1, 1);
		Matrix sums = bigMatrix.solve(onesColumn);

		System.out.println(sums.get(0,0));
		System.out.println(System.currentTimeMillis() - startTime + "ms");

	}

	// Returns a list of possible continuation states from the current one.
	static List<State> nextStates(State state){

		// Current and changing position of the ant
		int antX = state.ant % 5;
		int antY = state.ant / 5;

		// Whether it can go into each of the four directions (N,S,E,W respectively).
		boolean[] possibleDirs = new boolean[4];
		Arrays.fill(possibleDirs, true);

		// Take out some directions if it's in the corner.
		if(antY == 0) possibleDirs[0] = false; // Can't go north
		if(antY == 4) possibleDirs[1] = false; // Can't go south
		if(antX == 4) possibleDirs[2] = false; // Can't go east
		if(antX == 0) possibleDirs[3] = false; // Can't go west

		// Construct a list of continuations.
		List<State> nextStates = new ArrayList<State>();

		// Loop through the four directions.
		for(int i=0; i<4; i++){

			// Cannot go this direction.
			if( !(possibleDirs[i])) continue;

			int newAntX = antX;
			int newAntY = antY;

			// Modify direction.
			switch(i){
				case 0: newAntY--; break;
				case 1: newAntY++; break;
				case 2: newAntX++; break;
				case 3: newAntX--; break;
			}

			// Start constructing new state object.
			int oldAnt = state.ant; // old ant position
			int newAnt = 5*newAntY + newAntX;
			int[] board = state.board.clone();
			boolean carrying = state.carrying;

			// Carrying a seed. Notice that a square can contain
			// two seeds at once (but not more); seeds are indistinguishable
			// so we just need to keep track of the number of seeds
			// on each square.
			if(carrying){
				board[oldAnt] --;
				board[newAnt] ++;
			}

			// Drop off the seed.
			if(newAntY == 0 && board[newAnt] == 1 && carrying)
				carrying = false;

			// Pick up a new seed.
			if(newAntY == 4 && board[newAnt] == 1 && !carrying)
				carrying = true;

			// Treat the five done positions the same.
			if(donePosition(board))
				nextStates.add(State.FINAL_STATE);

			// Add it normally.
			else nextStates.add(new State(board, newAnt, carrying));

		}

		return nextStates;
	}

	// Recursively add all continuation states.
	static void addAllStates(State state){

		// Don't add if it's already added.
		if(states.containsKey(state)) return;

		List<State> nexts = nextStates(state);
		states.put(state, nexts);

		// Recurse (but not if we've reached the final state).
		for(State next : nexts)
			if( !(next.equals(State.FINAL_STATE)))
				addAllStates(next);

	}

	// Is the board in the done position?
	static boolean donePosition(int[] b){
		return b[0]==1 && b[1]==1 && b[2]==1 && b[3]==1 && b[4]==1;
	}
}

class State{

	static final State INIT_STATE = new State(
	new int[]{
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		1,1,1,1,1
	}, 12, false);

	// Consider all final states the same; there is
	// no ant position.
	static final State FINAL_STATE = new State(
	new int[]{
		1,1,1,1,1,
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0
	}, -1, true);

	// 25 board.
	int[] board;

	int ant;
	boolean carrying;

	State(int[] board, int ant, boolean carrying){
		this.board = board;
		this.ant = ant;
		this.carrying = carrying;
	}

	State(State s){
		this(s.board, s.ant, s.carrying);
	}

	public boolean equals(Object o){
		State s = (State) o;
		return Arrays.equals(s.board, board) && s.ant == ant && s.carrying == carrying;
	}

	public int hashCode(){
		return Arrays.hashCode(board) + ant;
	}

	// For debugging mostly.
	public String toString(){
		StringBuilder ret = new StringBuilder("\n");
		for(int i=0; i<5; i++){
			for(int j=0; j<5; j++){
				int pos = 5*i + j;
				if(ant == pos) ret.append("#");
				else ret.append(board[pos] >= 1? "+" : "-");
			}
			ret.append("\n");
		}
		return ret.toString();
	}
}

Running it:

So my program takes about 19 minutes to run, using up over 2GB of ram. The construction of the matrix only takes about 2 seconds, and the rest of the time is used on solving the huge matrix.

JAMA is probably not the fastest matrix library; the running time might be cut down a bit if we use libraries designed for sparse matrices.

But to get this to run in under a minute, a completely different approach would be needed. For now, though, I’m pretty happy with getting it in 19 minutes.


Solving systems of linear equations in Haskell

February 21, 2010

Haskell isn’t normally used for things like this, but it’s quite possible to solve systems of linear equations with Haskell.

There are already several libraries for doing this, and other more advanced matrix manipulating. But here, I’m going to start simple.

In mathematics, systems of linear equations are usually represented by an augmented matrix. A system of n linear equations would be represented by an augmented matrix with n rows and n+1 columns.

For example, we have this system of equations:

\begin{array}{rrrcl} x &+2y &-z &=& -4 \\ 2x &+3y &-z &=& -11 \\ -2x &&-3z &=& 22 \end{array}

This would be represented as an augmented matrix:

\left[ \begin{array}{ccc|c} 1 & 2 & -1 & -4 \\ 2 & 3 & -1 & -11 \\ -2 & 0 & -3 & 22 \end{array} \right]

In Haskell we represent this as a list of lists, like this:

[ [1,2,-1,-4], [2,3,-1,-11], [-2,0,-3,22] ]

Here I’ll store each entry not as an integer, but as a floating point. You could also use Rational in Data.Ratio but both are fine for now.

The advantage of using Rational over Float is that sometimes you will end up with fractions that don’t work very well with floating point numbers. However, I’ve found that printing a list of lists of Rational types makes it difficult to read, unless you implement a custom show function for it.

So this is how we define our matrix types in Haskell:

type Row = [Float]
type Matrix = [Row]

The approach to solving this problem is rather simple. First we reduce whatever matrix we have to REF, or Row Echelon Form and then get the actual roots with some back substitution.

The algorithm used to transform a matrix to its Row Echelon Form is known as the Gaussian Elimination. Here’s what a matrix should look like after Gaussian Elimination (a * represents any value):

\left[ \begin{array}{ccc|c} 1 & * & * & * \\ 0 & 1 & * & * \\ 0 & 0 & 1 & * \end{array} \right]

Our matrix should look like this after Gaussian Elimination:

\left[ \begin{array}{ccc|c} 1 & 2 & -1 & -4 \\ 0 & 1 & -1 & 3 \\ 0 & 0 & 1 & -2 \end{array} \right]

The REF form is not unique, so that is only one of the possible valid outputs for the Gaussian Elimination.

Why do we want to have the matrix in REF form? A matrix in this form can easily be solved using back substitution. Consider this matrix as a series of linear equations, as we did before:

\begin{array}{rrrcl} x &+2y &-z &=& -4 \\ &+y &-z &=& 3 \\ &&z &=& -2 \end{array}

Now it would be very clear how to solve for the three variables.

The Gaussian Elimination Algorithm

Here is a diagram of how Gaussian Elimination works. On each iteration, the element circled in green is considered the pivot element, while the elements enclosed in the red square are the ones we intend to remove (zero) in each iteration.

Removing the red elements in the matrix is actually quite simple. Consider how you would eliminate x in equation B here:

\begin{array}{lrrcl}(A) & x & +2y & = & 4 \\(B) & 2x & +y & = & 5 \end{array}

Probably you would multiply equation A by 2, giving 2x + 4y = 8, then subtract B from it, giving 3y=3, eliminating x.

We can also write that as B = 2A - B. Basically to eliminate a variable, just multiply a row so it matches up, and subtract. This is middle school algebra.

To make things easier for us, we divide the row we are on so that the pivot is always 1. We do this now because we need them to be 1 anyways, and this avoids an unnecessary division in the next step.

We could, of course, not have the pivot always be 1, but we would have to do the divisions later when substituting to get the solutions. More on this later.

So to eliminate the variable under the pivot, multiply the whole row by that number. I have a picture to clarify:

We simply repeat this for all elements under the pivot.

An edge case

This is where it gets a little bit tricky. What if the pivot is 0?

We have no way of making it 1 by any kind of multiplication. Further, we cannot eliminate any elements below the pivot. What do we do now?

Simple. We swap the current row with any other row so that the pivot is not zero. Any row will do, so we’ll just pick the first one that fits.

If there is not a single element below the pivot that is not zero, the matrix is either under-determined or singular; either case it is unsolvable.

Here is my Haskell code on what I just covered:

gaussianReduce :: Matrix -> Matrix
gaussianReduce matrix = fixlastrow $ foldl reduceRow matrix [0..length matrix-1] where

 --swaps element at position a with element at position b.
 swap xs a b
  | a > b = swap xs b a
  | a == b = xs
  | a < b = let
  (p1,p2) = splitAt a xs
  (p3,p4) = splitAt (b-a-1) (tail p2)
  in p1 ++ [xs!!b] ++ p3 ++ [xs!!a] ++ (tail p4)

 reduceRow matrix1 r = let
  --first non-zero element on or below (r,r).
  firstnonzero = head $ filter (\x -> matrix1 !! x !! r /= 0) [r..length matrix1-1]

  --matrix with row swapped (if needed)
  matrix2 = swap matrix1 r firstnonzero

  --row we're working with
  row = matrix2 !! r

  --make it have 1 as the leading coefficient
  row1 = map (\x -> x / (row !! r)) row

  --subtract nr from row1 while multiplying
  subrow nr = let k = nr!!r in zipWith (\a b -> k*a - b) row1 nr

  --apply subrow to all rows below
  nextrows = map subrow $ drop (r+1) matrix2

  --concat the lists and repeat
  in take r matrix2 ++ [row1] ++ nextrows

 fixlastrow matrix' = let
  a = init matrix'; row = last matrix'; z = last row; nz = last (init row)
  in a ++ [init (init row) ++ [1, z / nz]]

Edit: There was a bug in the above code, found by Alan Zimmerman. I think it’s been fixed.

This may be a bit difficult to read because there is no syntax highlighting and the code is cut off. I’ll provide a link to the full source code at the end.

Admittedly Haskell may not have been the best language to implement this algorithm this particular way because there is so much state changing. Any language that allows mutable state would probably perform better than this code.

Notice that at the end, the last row does not get divided. The fixlastrow function corrects this problem.

Let’s test this code:

*Main> gaussianReduce [ [1,2,-1,-4], [2,3,-1,-11], [-2,0,-3,22] ]
[[1.0,2.0,-1.0,-4.0],[0.0,1.0,-1.0,3.0],[-0.0,0.0,1.0,-2.0]]

Excellent.

Finishing up

The next step of the algorithm is to solve the variables by back substitution. This is pretty easy, I think. My code keeps a list of already-found solutions. Folding from the right, each step it substitutes in the corresponding solution and multiplies & subtracts to get the next solution, adding that to the solution list.

--Solve a matrix (must already be in REF form) by back substitution.
substitute :: Matrix -> Row
substitute matrix = foldr next [last (last matrix)] (init matrix) where

 next row found = let
  subpart = init $ drop (length matrix - length found) row
  solution = last row - sum (zipWith (*) found subpart)
  in solution : found

To get a list of solutions from a matrix, we chain the substitute and gaussianReduce functions:

solve :: Matrix -> Row
solve = substitute . gaussianReduce
*Main> solve [ [1,2,-1,-4], [2,3,-1,-11], [-2,0,-3,22] ]
[-8.0,1.0,-2.0]

This means the solutions are (x,y,z) = (-8,1,-2). That seems correct, so we’re done!

The code is far from practical, though. Although it works, I haven’t really tested its performance (probably not very good), and it doesn’t handle all edge cases.


Follow

Get every new post delivered to your Inbox.

Join 68 other followers