Beginner’s comparison of Computer Algebra Systems (Mathematica / Maxima / Maple)

August 11, 2014

I’ve never been very good at doing manual computations, and whenever I need to do a tedious computation for an assignment, I like to automate it by writing a computer program. Usually I implemented an ad-hoc solution using Haskell, either using a simple library or rolling my own implementation if the library didn’t have it. But I found this solution to be unsatisfactory: my Haskell programs worked with integers and floating numbers and I couldn’t easily generalize it to work with symbolic expressions. So I looked to learn a CAS (computer algebra system), so in the future I won’t have to hack together buggy code for common math operations.

I have no experience with symbolic computing, so it wasn’t clear to me where to begin. To start off, there are many different competing computer algebra systems, all incompatible with each other, and it’s far from clear which one is best for my needs. I began to experiment with several systems, but after a few days I still couldn’t decide which one was the winner.

I narrowed it down to 3 platforms. Here’s my setup (all running on Windows 7):

  • Mathematica 8.0
  • Maxima 5.32 with wxMaxima 13.04
  • Maple 18.00

So I came up with a trial — I had a short (but nontrivial) problem representative of the type of problem I’d be looking at, and I would try to solve it in all 3 languages, to determine which one was easiest to work with.

The Problem

This problem came up as a part of a recent linear algebra assignment.

Let the field be \mathbb{Z}_5 (so all operations are taken modulo 5). Find all 2×2 matrices P such that

P^T \left( \begin{array}{cc} 2 & 0 \\ 0 & 3 \end{array} \right) P = I

We can break this problem into several steps:

  • Enumerate all lists of length 4 of values between 0 to 4, that is, [[0,0,0,0],[0,0,0,1],…,[4,4,4,4]]. We will probably do this with a cartesian product or list comprehension.
  • Figure out how to convert a list into a 2×2 matrix form that the system can perform matrix operations on. For example, [1,2,3,4] might become matrix([1,2],[3,4])
  • Figure out how to do control flow, either by looping over a list (procedural) or with a map and filter (functional)
  • Finally, multiply the matrices modulo 5 and check if it equals the identity matrix, and output.

This problem encompasses a lot of the challenges I have with CAS software, that is, utilize mathematical functions (in this case, we only use matrix multiplication and transpose), yet at the same time express a nontrivial control flow. There are 5^4=625 matrices to check, so performance is not a concern; I am focusing on ease of use.

For reference, here is the answer to this problem:

These are the 8 matrices that satisfy the desired property.

I have no prior experience in programming in any of the 3 languages, and I will try to solve this problem with the most straightforward way possible with each of the languages. I realize that my solutions will probably be redundant and inefficient because of my inexperience, but it will balance out in the end because I’m equally inexperienced in all of the languages.

Mathematica

I started with Mathematica, a proprietary system by Wolfram Research and the engine behind Wolfram Alpha. Mathematica is probably the most powerful out of the three, with capabilities with working with data well beyond what I’d expect from a CAS.

What I found most jarring about Mathematica is its syntax. I’ve worked with multiple procedural and functional languages before, and there are certain things that Mathematica simply does differently from everybody else. Here are a few I ran across:

  • To use a pure function (equivalent of a lambda expression), you refer to the argument as #, and the function must end with the & character
  • The preferred shorthand for Map is /@ (although you can write the longhand Map)
  • To create a cartesian product of a list with itself n times, the function is called Tuples, which I found pretty counterintuitive

Initially I wanted to convert my flat list into a nested list by pattern matching Haskell style, ie f [a,b,c,d] = [[a,b],[c,d]], but I wasn’t sure how to do that, or if the language supports pattern matching on lists. However I ran across Partition[xs,2] which does the job, so I went with that.

Despite the language oddities, the functions are very well documented, so I was able to complete the task fairly quickly. The UI is fairly streamlined and intuitive, so I’m happy with that. I still can’t wrap my head around the syntax — I would like it more if it behaved more like traditional languages — but I suppose I’ll get the hang of it after a while.

Here’s the program I came up with:

SearchSpaceLists := Tuples[Range[0, 4], 4]
SearchSpaceMatrices := 
 Map[Function[xs, Partition[xs, 2]], SearchSpaceLists]
Middle := {{2, 0}, {0, 3}}
FilteredMatrices := 
 Select[SearchSpaceMatrices, 
  Mod[Transpose[#].Middle.#, 5] == IdentityMatrix[2] &]
MatrixForm[#] & /@ FilteredMatrices

Maxima

Maxima is a lightweight, open source alternative to Mathematica; I’ve had friends recommend it as being small and easy to use.

The syntax for Maxima is more natural, with things like lists and loops and lambda functions working more or less the way I expect. However, whenever I tried to do something with a function that isn’t the most common use case, I found the documentation lacking and often ended up combing through old forum posts.

Initially I tried to generate a list with a cartesian product like my Mathematica version, but I couldn’t figure out how to do that, eventually I gave up and used 4 nested for loops because that was better documented.

Another thing I had difficulty with was transforming a nested list into a matrix using the matrix command. Normally you would create a matrix with matrix([1,2],[3,4]), so by passing in two parameters. The function doesn’t handle passing in matrix([[1,2],[3,4]]), so to get around that you need to invoke a macro: funmake(‘matrix,[[1,2],[3,4]]).

Overall I found that the lack of documentation made the system frustrating to work with. I would however use it for simpler computations that fall under the common use cases — these are usually intuitive in Maxima.

Here’s the program I came up with:

Middle:matrix([2,0],[0,3]);
Ident:identfor(Middle);
for a:0 thru 4 do
  for b:0 thru 4 do
    for c:0 thru 4 do
      for d:0 thru 4 do
        (P:funmake('matrix,[[a,b],[c,d]]),
         P2:transpose(P).Middle.P,
         if matrixmap(lambda([x],mod(x,5)),P2) = Ident then
             print(P));

Shortly after writing this I realized I didn’t actually need the funmake macro, since there’s no need to generate a nested list in the first place, I could simply do matrix([a,b],[c,d]). Oh well, the point still stands.

Maple

Maple is a proprietary system developed by Maplesoft, a company based in Waterloo. Being a Waterloo student, I’ve had some contact with Maple: professors used it for demonstrations, some classes used it for grading. Hence I felt compelled to give Maple a shot.

At first I was pleasantly surprised that matrix multiplication in a finite field was easy — the code to calculate A*B in \mathbb{Z}_5 is simply A.B mod 5. But everything went downhill after that.

The UI for Maple feels very clunky. Some problems I encountered:

  • It’s not clear how to halt a computation that’s in a an infinite loop. It doesn’t seem to be possible within the UI, and the documentation suggests it’s not possible in all cases (it recommends manually terminating the process). Of course, this loses all unsaved work, so I quickly learned to save before every computation.
  • I can’t figure out how to delete a cell without googling it. It turns out you have to select your cell and a portion of the previous cell, then hit Del.
  • Copy and pasting doesn’t work as expected. When I tried to copy code written inside Maple to a text file, all the internal formatting and syntax highlighting information came with it.
  • Not an UI issue, but error reporting is poor. For example, the = operator works for integers, but when applied to matrices, it silently returns false. You have to use Equals(a,b) to compare matrices (this is kind of like java).

In the end, I managed to complete the task but the poor UI made the whole process fairly unpleasant. I don’t really see myself using Maple in the future; if I had to, I would try the command line.

Here’s the program I came up with:

with(LinearAlgebra):
with(combinat, cartprod):
L := [seq(0..4)]:
T := cartprod([L, L, L, L]):
Middle := <2,0;0,3>:
while not T[finished] do
  pre_matrix := T[nextvalue]();
  matr := Matrix(2,2,pre_matrix);
  if Equal(Transpose(matr).Middle.matr mod 5, IdentityMatrix(2)) then
    print(matr);
  end if
end do:

Conclusion

After the brief trial, there is still no clear winner, but I have enough data to form some personal opinions:

  • Mathematica is powerful and complete, but has a quirky syntax. It has the most potential — definitely the one I would go with if I were to invest more time into learning a CAS.
  • Maxima is lightweight and fairly straightfoward, but because of lack of documentation, it might not be the best tool to do complicated things with. I would keep it for simpler calculations though.
  • Maple may or may not be powerful compared to the other two, I don’t know enough to compare it. But its UI is clearly worse and it would take a lot to compensate for that.

Project Euler 280

March 6, 2010

Project Euler 280 is an interesting problem involving probability and combinatorics:

There is a 5×5 grid. An ant starts in the center square of the grid and walks randomly. Each step, the ant moves to an adjacent square (but not off the grid).

In each of the five bottom-most squares, there is a seed. When the ant reaches such a square, he picks up the seed (if he isn’t already carrying one).

When the ant reaches a square in the top-most row while carrying a seed, he drops off the seed (if there isn’t already a seed there).

This ‘game’ ends when all five seeds are in the five squares of the top-most row.

The problem asks for the expected (average) number of turns it would take to get from the initial to the ending position. It requires six digits of precision.

The Monto Carlo Approach

Perhaps the easiest way to tackle this problem is with a Monto Carlo simulation. This uses a computer to actually run the game many times, using random number generators.

This is my straight-forward Monto Carlo implementation in Java:

import java.util.*;

public class Main{
	public static void main(String[] args){
		Random r = new Random();
		int stepsum = 0;
		int tries = 0;
		while(true){
			stepsum += new Simulation(r).simulate();
			tries ++;
			System.out.println((double) stepsum / (double) tries);
		}
	}
}

class Simulation{

	static final int[] init = {
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		1,1,1,1,1
	};
	State state;
	Random rand;
	int steps;

	Simulation(Random rand){
		state = new State(init, 12, false);
		this.rand = rand;
		steps = 0;
	}

	int simulate(){
		while(!done())
			step();
		return steps;
	}

	boolean done(){
		int[] b = state.board;
		return b[0]==1 && b[1]==1 && b[2]==1 && b[3]==1 && b[4]==1;
	}

	boolean step(){
		steps ++;

		int antX = state.ant % 5;
		int antY = state.ant / 5;
		int dir;

		while(true){
			// 0:N 1:S 2:E 3:W
			dir = rand.nextInt(4);
			if(antY==0 && dir==0) continue;
			if(antY==4 && dir==1) continue;
			if(antX==4 && dir==2) continue;
			if(antX==0 && dir==3) continue;
			break;
		}

		switch(dir){
			case 0: antY--; break;
			case 1: antY++; break;
			case 2: antX++; break;
			case 3: antX--; break;
		}

		int oldAnt = state.ant;
		state.ant = 5*antY + antX;

		if(state.carrying){
			state.board[oldAnt]--;
			state.board[state.ant]++;
		}

		if(antY == 0 && state.board[state.ant] == 1 && state.carrying){
			// drop off
			state.carrying = false;
		}

		if(antY == 4 && state.board[state.ant] == 1 && !state.carrying){
			// pick up
			state.carrying = true;
		}

		return true;
	}
}

class State{
	// 25 board.
	int[] board;

	int ant;
	boolean carrying;

	State(int[] board, int ant, boolean carrying){
		this.board = board.clone();
		this.ant = ant;
		this.carrying = carrying;
	}

	State(State s){
		this(s.board, s.ant, s.carrying);
	}

	public boolean equals(Object o){
		State s = (State) o;
		return Arrays.equals(s.board, board) && s.ant == ant;
	}

	public int hashCode(){
		return Arrays.hashCode(board) + ant;
	}
}

Running this program will produce something close to the final result. But this will not nearly be accurate enough to actually solve the problem: running this for a few hours would only produce two or three digits of precision, while six is required.

It’s not exactly impossible, as on the forums Rodinio claims to have run a similar Monto Carlo simulation on a computer cluster, giving enough precision to start making guesses for the last digits. To get so many digits on a Monto Carlo simulation requires an astronomical amount of tries.

Obviously this is not the most efficient approach.

Introducing the Markov Chain

A more efficient solution can be implemented using Markov Chains.

A Markov Chain solves the problem of discrete random processes. What is a Markov Chain, and how is it used?

Let’s say we have a finite group of states, S = \{ s_1, s_2, \cdots , s_n \}. Make a matrix M, of size n * n.

On each step (or move) of the process, the probability of going from state s_i to s_j is M_{ij}.

The process starts on an initial state. From each state there is a certain probability of going to every other state.

Notice that in a Markov Chain, what happens next does not depend on what has happened before. The only data is the current state you’re on.

I’ll explain all this with a very simple example:

On a 1×3 game board, Bob (the happy face) starts at square 1. Each step, he randomly moves to an adjacent square with equal probability. The game ends when he reaches square 3.

Of course the only time he has a ‘choice’ in which square to move to is when he’s on square 2 (half-half chance of moving to 1 and 3). If he’s already on square 1, he has a 100% chance of moving to square 2, and if he’s on square 3, well, the game is over.

We can represent this game like this:

Another way of showing the above would be something like this:

This is the matrix, of course:

\left[ \begin{array}{ccc} 0 & 1 & 0 \\ \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & 0 & 1 \end{array} \right]

This type of matrix is called a Stochastic matrix. What’s special about this matrix is that each row of the matrix adds up to 1.

A state that can only return to itself is called an absorbing state. If in the matrix, M_{ii} = 1 for any value of i, then state i is an absorbing state.

Here, state 3 is an absorbing state.

The next step is very important: determining the time until absorption.

From our matrix, take out all of the absorbing rows, and all of the left columns to make it a square matrix:

It seems that by doing this we are losing information. But we’re actually not. We are treating each absorbing state as the same; and since each row originally added up to 1, the probability of going to the absorbing state is simply 1 minus the sum of the rest of the row.

This form is considered the canonical form of the stochastic matrix. Our matrix in canonical form looks like this:

\left[ \begin{array}{cc} 0 & 1 \\ \frac{1}{2} & 0 \end{array} \right]

Next we subtract it from the identity matrix.

\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] - \left[ \begin{array}{cc} 0 & 1 \\ \frac{1}{2} & 0 \end{array} \right] = \left[ \begin{array}{cc} 1 & -1 \\ -\frac{1}{2} & 1 \end{array} \right]

Next we invert this matrix and get:

\left[ \begin{array}{cc} 1 & -1 \\ -\frac{1}{2} & 1 \end{array} \right]^{-1} = \left[ \begin{array}{cc} 2 & 2 \\ 1 & 2 \end{array} \right]

The expected absorption time from state s_n is simply the sum of row n in the above matrix.

So because we started on state 1, the expected length of the game is 2+2 or 4.

All of this seems like magic, because I haven’t given any proof for this. If you really want to prove that this method works, you can get a textbook about Markov chains.

“Markov Chains” by J.R.Norris (Cambridge 1997) is a decent textbook if you want to know more about this. I would recommend this as a less dense introduction to Markov chains.

An attempt using Markov Chains

Now that we know some basics of Markov chains, we need to apply it to solve the problem.

If you attach a Set or something to the Monto Carlo simulation program, you would find that there are 10270 distinct states for this problem. If we count the five final states as the same state, then we have 10266 states.

This is exactly the same as what we’ve just done. Except instead of a 3*3 matrices, we’re now working with matrices with tens of thousands of rows and columns. Woo hoo.

You can probably see here that this is going to be a problem. A 10265*10265 (final state not counted) matrix contains over 100 million elements. Most of these elements will be zero. If we store all of this data in double types, that’s 800mb of memory used, just for the matrix.

It’s easy to underestimate the size of this problem. But there’s a few optimizations we can make.

We could save some time and space by subtracting from the identity matrix at the same time as filling up the matrix.

Inverting a 10265*10265 matrix is no small task either. Instead there’s a simple optimization to avoid inverting such a huge matrix:

Remember that we want to find the inverse of the matrix, and get the sum of all the elements in the first row. Let M be our matrix and M' be the inverse.

What’s important here is that finding the sum of all the rows of a matrix is the same as multiplying by a column of ones. This produces a one-columned matrix, each cell representing the sum of the corresponding row.

If C is the column of ones, and S is the sum matrix (what we want), then M' * C = S. We can also write that as M * S = C.

See, this way we don’t have to compute the inverse. We just have to solve the above equation for S.

To avoid implementing the standard matrix solve function, I’ll be using the JAMA library for matrix manipulation.

Finally, the source code for my implementation of what I just discussed:

import java.util.*;
import Jama.*;

public class Main{

	// For a state, maps to all possible states from it.
	// All next-states have the same probability.
	static Map<State,List<State>> states = new LinkedHashMap<State,List<State>>();

	public static void main(String[] args){

		// Keep the time.
		long startTime = System.currentTimeMillis();

		// Initialize the state map.
		addAllStates(State.INIT_STATE);

		// To construct the matrix, we need to map all the states to a position
		// in the matrix (an integer). Because our map is already ordered, we
		// use its position in the map as the position in the matrix. In order
		// to avoid doing a linear search through the keyset to find its position,
		// we cache its position in a map.

		Set<State> keySet = states.keySet();
		Map<State,Integer> positionMap = new HashMap<State,Integer>();

		// Set up position map.
		Iterator<State> iterator = keySet.iterator();
		int pos = 0;
		while(iterator.hasNext()){
			positionMap.put(iterator.next(), pos);
			pos++;
		}

		// Finally we can begin constructing our matrix.
		// This takes up about 900mb of memory.
		double[][] matrix = new double[states.size()][states.size()];

		// Fill up the matrix.
		Collection<List<State>> values = states.values();
		Iterator<List<State>> rowIterator = values.iterator();

		for(int row=0; row<matrix.length; row++){

			// Fill up one row.
			List<State> thisRow = rowIterator.next();

			// What to fill with?
			double fill = 1.0 / thisRow.size();

			// An optimization: why not do the subtraction step here
			// instead of subtracting from identity matrix?
			matrix[row][row] = 1;

			for(State st : thisRow){
				// The final state doesn't count.
				if(st.equals(State.FINAL_STATE)) continue;

				int col = positionMap.get(st);
				matrix[row][col] = -fill;
			}
		}

		// Finishing up.
		Matrix bigMatrix = new Matrix(matrix);
		Matrix onesColumn = new Matrix(states.size(), 1, 1);
		Matrix sums = bigMatrix.solve(onesColumn);

		System.out.println(sums.get(0,0));
		System.out.println(System.currentTimeMillis() - startTime + "ms");

	}

	// Returns a list of possible continuation states from the current one.
	static List<State> nextStates(State state){

		// Current and changing position of the ant
		int antX = state.ant % 5;
		int antY = state.ant / 5;

		// Whether it can go into each of the four directions (N,S,E,W respectively).
		boolean[] possibleDirs = new boolean[4];
		Arrays.fill(possibleDirs, true);

		// Take out some directions if it's in the corner.
		if(antY == 0) possibleDirs[0] = false; // Can't go north
		if(antY == 4) possibleDirs[1] = false; // Can't go south
		if(antX == 4) possibleDirs[2] = false; // Can't go east
		if(antX == 0) possibleDirs[3] = false; // Can't go west

		// Construct a list of continuations.
		List<State> nextStates = new ArrayList<State>();

		// Loop through the four directions.
		for(int i=0; i<4; i++){

			// Cannot go this direction.
			if( !(possibleDirs[i])) continue;

			int newAntX = antX;
			int newAntY = antY;

			// Modify direction.
			switch(i){
				case 0: newAntY--; break;
				case 1: newAntY++; break;
				case 2: newAntX++; break;
				case 3: newAntX--; break;
			}

			// Start constructing new state object.
			int oldAnt = state.ant; // old ant position
			int newAnt = 5*newAntY + newAntX;
			int[] board = state.board.clone();
			boolean carrying = state.carrying;

			// Carrying a seed. Notice that a square can contain
			// two seeds at once (but not more); seeds are indistinguishable
			// so we just need to keep track of the number of seeds
			// on each square.
			if(carrying){
				board[oldAnt] --;
				board[newAnt] ++;
			}

			// Drop off the seed.
			if(newAntY == 0 && board[newAnt] == 1 && carrying)
				carrying = false;

			// Pick up a new seed.
			if(newAntY == 4 && board[newAnt] == 1 && !carrying)
				carrying = true;

			// Treat the five done positions the same.
			if(donePosition(board))
				nextStates.add(State.FINAL_STATE);

			// Add it normally.
			else nextStates.add(new State(board, newAnt, carrying));

		}

		return nextStates;
	}

	// Recursively add all continuation states.
	static void addAllStates(State state){

		// Don't add if it's already added.
		if(states.containsKey(state)) return;

		List<State> nexts = nextStates(state);
		states.put(state, nexts);

		// Recurse (but not if we've reached the final state).
		for(State next : nexts)
			if( !(next.equals(State.FINAL_STATE)))
				addAllStates(next);

	}

	// Is the board in the done position?
	static boolean donePosition(int[] b){
		return b[0]==1 && b[1]==1 && b[2]==1 && b[3]==1 && b[4]==1;
	}
}

class State{

	static final State INIT_STATE = new State(
	new int[]{
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		1,1,1,1,1
	}, 12, false);

	// Consider all final states the same; there is
	// no ant position.
	static final State FINAL_STATE = new State(
	new int[]{
		1,1,1,1,1,
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0,
		0,0,0,0,0
	}, -1, true);

	// 25 board.
	int[] board;

	int ant;
	boolean carrying;

	State(int[] board, int ant, boolean carrying){
		this.board = board;
		this.ant = ant;
		this.carrying = carrying;
	}

	State(State s){
		this(s.board, s.ant, s.carrying);
	}

	public boolean equals(Object o){
		State s = (State) o;
		return Arrays.equals(s.board, board) && s.ant == ant && s.carrying == carrying;
	}

	public int hashCode(){
		return Arrays.hashCode(board) + ant;
	}

	// For debugging mostly.
	public String toString(){
		StringBuilder ret = new StringBuilder("\n");
		for(int i=0; i<5; i++){
			for(int j=0; j<5; j++){
				int pos = 5*i + j;
				if(ant == pos) ret.append("#");
				else ret.append(board[pos] >= 1? "+" : "-");
			}
			ret.append("\n");
		}
		return ret.toString();
	}
}

Running it:

So my program takes about 19 minutes to run, using up over 2GB of ram. The construction of the matrix only takes about 2 seconds, and the rest of the time is used on solving the huge matrix.

JAMA is probably not the fastest matrix library; the running time might be cut down a bit if we use libraries designed for sparse matrices.

But to get this to run in under a minute, a completely different approach would be needed. For now, though, I’m pretty happy with getting it in 19 minutes.


Solving systems of linear equations in Haskell

February 21, 2010

Haskell isn’t normally used for things like this, but it’s quite possible to solve systems of linear equations with Haskell.

There are already several libraries for doing this, and other more advanced matrix manipulating. But here, I’m going to start simple.

In mathematics, systems of linear equations are usually represented by an augmented matrix. A system of n linear equations would be represented by an augmented matrix with n rows and n+1 columns.

For example, we have this system of equations:

\begin{array}{rrrcl} x &+2y &-z &=& -4 \\ 2x &+3y &-z &=& -11 \\ -2x &&-3z &=& 22 \end{array}

This would be represented as an augmented matrix:

\left[ \begin{array}{ccc|c} 1 & 2 & -1 & -4 \\ 2 & 3 & -1 & -11 \\ -2 & 0 & -3 & 22 \end{array} \right]

In Haskell we represent this as a list of lists, like this:

[ [1,2,-1,-4], [2,3,-1,-11], [-2,0,-3,22] ]

Here I’ll store each entry not as an integer, but as a floating point. You could also use Rational in Data.Ratio but both are fine for now.

The advantage of using Rational over Float is that sometimes you will end up with fractions that don’t work very well with floating point numbers. However, I’ve found that printing a list of lists of Rational types makes it difficult to read, unless you implement a custom show function for it.

So this is how we define our matrix types in Haskell:

type Row = [Float]
type Matrix = [Row]

The approach to solving this problem is rather simple. First we reduce whatever matrix we have to REF, or Row Echelon Form and then get the actual roots with some back substitution.

The algorithm used to transform a matrix to its Row Echelon Form is known as the Gaussian Elimination. Here’s what a matrix should look like after Gaussian Elimination (a * represents any value):

\left[ \begin{array}{ccc|c} 1 & * & * & * \\ 0 & 1 & * & * \\ 0 & 0 & 1 & * \end{array} \right]

Our matrix should look like this after Gaussian Elimination:

\left[ \begin{array}{ccc|c} 1 & 2 & -1 & -4 \\ 0 & 1 & -1 & 3 \\ 0 & 0 & 1 & -2 \end{array} \right]

The REF form is not unique, so that is only one of the possible valid outputs for the Gaussian Elimination.

Why do we want to have the matrix in REF form? A matrix in this form can easily be solved using back substitution. Consider this matrix as a series of linear equations, as we did before:

\begin{array}{rrrcl} x &+2y &-z &=& -4 \\ &+y &-z &=& 3 \\ &&z &=& -2 \end{array}

Now it would be very clear how to solve for the three variables.

The Gaussian Elimination Algorithm

Here is a diagram of how Gaussian Elimination works. On each iteration, the element circled in green is considered the pivot element, while the elements enclosed in the red square are the ones we intend to remove (zero) in each iteration.

Removing the red elements in the matrix is actually quite simple. Consider how you would eliminate x in equation B here:

\begin{array}{lrrcl}(A) & x & +2y & = & 4 \\(B) & 2x & +y & = & 5 \end{array}

Probably you would multiply equation A by 2, giving 2x + 4y = 8, then subtract B from it, giving 3y=3, eliminating x.

We can also write that as B = 2A - B. Basically to eliminate a variable, just multiply a row so it matches up, and subtract. This is middle school algebra.

To make things easier for us, we divide the row we are on so that the pivot is always 1. We do this now because we need them to be 1 anyways, and this avoids an unnecessary division in the next step.

We could, of course, not have the pivot always be 1, but we would have to do the divisions later when substituting to get the solutions. More on this later.

So to eliminate the variable under the pivot, multiply the whole row by that number. I have a picture to clarify:

We simply repeat this for all elements under the pivot.

An edge case

This is where it gets a little bit tricky. What if the pivot is 0?

We have no way of making it 1 by any kind of multiplication. Further, we cannot eliminate any elements below the pivot. What do we do now?

Simple. We swap the current row with any other row so that the pivot is not zero. Any row will do, so we’ll just pick the first one that fits.

If there is not a single element below the pivot that is not zero, the matrix is either under-determined or singular; either case it is unsolvable.

Here is my Haskell code on what I just covered:

gaussianReduce :: Matrix -> Matrix
gaussianReduce matrix = fixlastrow $ foldl reduceRow matrix [0..length matrix-1] where

 --swaps element at position a with element at position b.
 swap xs a b
  | a > b = swap xs b a
  | a == b = xs
  | a < b = let
  (p1,p2) = splitAt a xs
  (p3,p4) = splitAt (b-a-1) (tail p2)
  in p1 ++ [xs!!b] ++ p3 ++ [xs!!a] ++ (tail p4)

 reduceRow matrix1 r = let
  --first non-zero element on or below (r,r).
  firstnonzero = head $ filter (\x -> matrix1 !! x !! r /= 0) [r..length matrix1-1]

  --matrix with row swapped (if needed)
  matrix2 = swap matrix1 r firstnonzero

  --row we're working with
  row = matrix2 !! r

  --make it have 1 as the leading coefficient
  row1 = map (\x -> x / (row !! r)) row

  --subtract nr from row1 while multiplying
  subrow nr = let k = nr!!r in zipWith (\a b -> k*a - b) row1 nr

  --apply subrow to all rows below
  nextrows = map subrow $ drop (r+1) matrix2

  --concat the lists and repeat
  in take r matrix2 ++ [row1] ++ nextrows

 fixlastrow matrix' = let
  a = init matrix'; row = last matrix'; z = last row; nz = last (init row)
  in a ++ [init (init row) ++ [1, z / nz]]

Edit: There was a bug in the above code, found by Alan Zimmerman. I think it’s been fixed.

This may be a bit difficult to read because there is no syntax highlighting and the code is cut off. I’ll provide a link to the full source code at the end.

Admittedly Haskell may not have been the best language to implement this algorithm this particular way because there is so much state changing. Any language that allows mutable state would probably perform better than this code.

Notice that at the end, the last row does not get divided. The fixlastrow function corrects this problem.

Let’s test this code:

*Main> gaussianReduce [ [1,2,-1,-4], [2,3,-1,-11], [-2,0,-3,22] ]
[[1.0,2.0,-1.0,-4.0],[0.0,1.0,-1.0,3.0],[-0.0,0.0,1.0,-2.0]]

Excellent.

Finishing up

The next step of the algorithm is to solve the variables by back substitution. This is pretty easy, I think. My code keeps a list of already-found solutions. Folding from the right, each step it substitutes in the corresponding solution and multiplies & subtracts to get the next solution, adding that to the solution list.

--Solve a matrix (must already be in REF form) by back substitution.
substitute :: Matrix -> Row
substitute matrix = foldr next [last (last matrix)] (init matrix) where

 next row found = let
  subpart = init $ drop (length matrix - length found) row
  solution = last row - sum (zipWith (*) found subpart)
  in solution : found

To get a list of solutions from a matrix, we chain the substitute and gaussianReduce functions:

solve :: Matrix -> Row
solve = substitute . gaussianReduce
*Main> solve [ [1,2,-1,-4], [2,3,-1,-11], [-2,0,-3,22] ]
[-8.0,1.0,-2.0]

This means the solutions are (x,y,z) = (-8,1,-2). That seems correct, so we’re done!

The code is far from practical, though. Although it works, I haven’t really tested its performance (probably not very good), and it doesn’t handle all edge cases.


Follow

Get every new post delivered to your Inbox.

Join 68 other followers