Coding a Tetris AI using a Genetic Algorithm

About two years ago, when I was in grade 9, I decided to make a tetris clone in Java. One or two months later, I had a fully working and playable tetris, complete with background and sound effects and scoring and graphical effects when a line is cleared.

A year after I created it, I decided to go ahead and write an AI to play my game. Although it played much better than I could (I’m not a particularly good tetris player), it would still die after a few dozen lines when the number of columns is 10. This means it was pretty outclassed by existing AI’s that could go on for thousands of lines!

Now, another year later, I coupled my previous AI algorithm with a genetic algorithm, with some pretty neat results:

Rules of the Game

How does one make a computer program to play tetris? Or more generally, how does one play tetris in the first place? The rules of tetris seem to me to be better viewed than explained, but I’ll give a quick overview of tetris gameplay and tetris strategies.

As I’m sure most of you know, in tetris you have blocks of four (tetrominoes) falling from the top of the board. The player moves and rotates the blocks and stacks them up:

Here the black outline is one of the places you can put the funny shaped block. And when a row is filled entirely with blocks (the row with the red outline below), you get a clear; that entire row is removed and the rest of the board is shifted down (often with a series of beeping noises and a small increase to your score):

If the blocks don’t get cleared and they stack to the top of the board, you lose. So ideally you want to fill as many lines as possible and avoid stacking the blocks up. Very simple.

Simple Strategies

So what are some things that we can try to avoid losing the game? Some things immediately come to mind. We lose when the blocks get stacked up so high that we can’t put in new blocks, right? So it makes sense to avoid piling up large numbers of blocks in high positions in the first place, or to penalize height:

So for each position the computer can calculate a height penalty — for each block the computer adds a number depending on how high it is. Then when the AI tries to decide where to put the next block, it ‘knows’ that blocks piled up really high is a bad thing and tries to avoid it.

Another strategy that seems pretty obvious is to try to get clears! We assign a positive score for each line we clear, in other words we reward clears. Duh.

Anyone who has played a few games of tetris would probably subconsciously know a number of intuitive strategies — packing together blocks as tightly as possible for instance. How do we translate that into code? Well, to start, blocks that are packed tightly has little or no holes — we’ll define these as any empty spaces for which there is a block somewhere directly above it:

Why don’t we want holes? A row is only considered a clear if the entire row is filled — if there’s even a single hole in the row, it doesn’t get removed. Not good. So it makes sense to give a negative score to positions with holes in them — to penalize holes.

It’s best to try not to have any holes at all, but sometimes having a hole or two is inevitable. What can we do after we have holes in our formation? Good question, but we should not pile more blocks on top of our holes. If we define a blockade as any block that’s directly above a hole, we should penalize blockades:

Why are blockades bad again? Well, a hole only stops being a hole if there are no more blocks above it, so stacking more blocks above holes would only make it harder to remove the hole.

These are all the obvious strategies. I also put in less obvious scores rewarding or penalizing for hugging the wall (edge of the current block touching edge of the wall), hugging the floor (edge of current block touching the floor) and flattening (rewarding if the edge of the current block touches an existing  block). Again, these are harder to justify and mostly for fine-tuning — it’s not even clear whether they should be positive or negative.

A Hedonistic AI

These strategies are sufficient to make a passable tetris AI. This algorithm is very simple:

1. Look at the current block and the next block and simulate ALL possible combinations (positions and rotations) of the two blocks.
2. Calculate a score for each of the positions.
3. Move the block to the position with the highest score and repeat.

To give a score for a position, we would use an equation like this:

Score = A * Sum of Heights
+ B * Number of Clears
+ C * Number of Holes
+ D * Number of Blockades

Where A, B, C, and D are weights that we decide — how important is each of the factors. I initially came up with some pretty arbitrary values:

• -0.03 for the height multiplier
• -7.5 per hole
• +8.0 per clear
• +3.0 for each edge touching another block
• +2.5 for each edge touching the wall
• +5.0 for each edge touching the floor

The reason I gave such a low multiplier for the height is because the numbers stack up so quickly it racks up a huge penalty for each block on the field. The numbers I chose seem pretty reasonable — and puts blocks more or less where a human would put them.

Playing God: Bringing in the Genetic Algorithm

The biggest problem with this method is that we choosed the weights pretty much arbitrarily. They might work well or they might not, but we don’t really know whether there are better values for them.

What can we do about it? We could brute force it — but with solutions that range across a continuum, there is a better way — a genetic algorithm.

A genetic algorithm is just a searching heuristic; it derives its ideas from evolution, where nature creates complex and sophisticated organisms by making random changes to the DNA.

Charles Darwin specifies four criteria for the process of natural selection to occur:

1. Variation: Organisms in a population must be slightly different from one another.
2. Inheritance: Traits of parent organisms must be passed onto their offspring.
3. Limited space: Only some of the offspring in any generation is able to survive and pass on its genes.
4. Competition: Individuals that are more fit are more likely to pass on their genes to the next generation.

In order to turn this into an algorithm, we’ll need — let’s quote this article:

• chromosome which expresses a possible solution to the problem as a string
• fitness function which takes a chromosome as input and returns a higher value for better solutions
• population which is just a set of many chromosomes
• selection method which determines how parents are selected for breeding from the population
• crossover operation which determines how parents combine to produce offspring
• mutation operation which determines how random deviations manifest themselves
We begin by constructing a chromosome — a solution to the problem of making an AI to play tetris. This is pretty easy, since we can already run an AI with a set of seven weights. So the chromosome is simply an array of seven doubles.

Next, our fitness function is very easy too, since the AI already has a scoring system. Basically the program would run the tetris AI at max speed on a 8 column board until it died, after which it would use the score it earned. Why only 8 columns and not the normal 10? In later generations, AI’s are able to survive for hours in the 10 column version, but when we reduce the number of columns to 8, even the best AI’s can survive for only a few seconds to a minute (we’re still talking about hundreds or thousands of lines here).

I used Nintendo’s original scoring system for tetris — 40 points for one clear, 120 points for two simultaneous clears, 300 for three simultaneous clears, and 1200 for four simultaneous clears. I also added 1 point for each block placed, to differentiate between AI’s that couldn’t score any lines.

Three, I chose a population of sixteen chromosomes. Initially the chromosomes are filled with randomly generated numbers (floating points fitting a normal distribution). Each generation onwards, the population’s chromosomes are derived from the best candidates of the previous generation (more on this later) — but the population size stays the same.

Next, for the selection method I chose a simple tournament method. After we run each candidate from a generation and collect all of their scores, we randomly pair up the candidates. For each pair, we take the high scorer — the winner — and discard the low scorer. Then, we pair up the winners randomly again to generate new offspring for the next generation.

Lastly, I implemented the crossover as follows: for each of the seven attributes in the offspring’s chromosome, we randomly select the respective attribute from the two parents with equal probability.

Occasionally, we have a mutation — a trait in an offspring that does not come from either parent. Each time we give an offspring an attribute, we have a 10% chance of assigning a completely random value to that attribute instead.

Results of the Genetic Algorithm

In the first generation or two, most candidates performed horribly. Many candidates had completely wrong weights — rewarding height and rewarding holes! Needless to say, these programs did not survive very long. But the genetic algorithm quickly came up with some decent solutions, and pretty soon most algorithms were scoring a few hundred lines (my original values gave about 20 lines on the 8-column version by comparison)

After running the genetic algorithm for about ten generations, I picked a candidate that was scoring decently:

• -3.78 for the height multiplier
• -2.31 per hole
• +1.6 per clear
• +3.97 for each edge touching another block
• +6.52 for each edge touching the wall
• +0.65 for each edge touching the floor

Whoa — that’s a huge height multiplier. Perhaps the multiplier is so big that it just overwhelms everything else in the list — remember that the height multiplier applies to every block on the field. Also, holes and blockades might not have been as bad as I thought — and what’s with the huge bonus for touching the wall?

I ran the whole thing again from scratch for ten generations — using different randomly generated starting values. What it came up with made me think at first that my program had a bug somewhere:

• -3.71 for the height multiplier
• -4.79 per hole
• -1.87 per clear
• +4.8 for each edge touching another block
• +3.22 for each edge touching the wall
• +3.68 for each edge touching the floor

Yup, this one rewarded blockades and penalized clears. And it would outperform both my naive values and the first set of AI values — I used this set in the video. It seems to put blocks in really bad positions, creating holes and blockades when it is unnecessary to — but it still does better than anything else I have.

Conclusion

Was the exercise a success? I would say partially so. There is the good and the bad. The good is that it came up with a much better AI than the one I originally had. But there may have been some things that could’ve been done better:

• What I said about the height multiplier overwhelming everything else is a bit misleading. While it is true that the height multiplier itself applies to every block on the field, it doesn’t really work that way, and really only affects the current block. Reason being, the rest of the field — everything but the current block — stays constant no matter where the current block goes. It’s kind of like if you vote for everybody, you really vote for nobody as your votes have no effect on the outcome.
• The lines cleared factor also turned out to be a bit misleading. While the second AI had a negative weight for clearing a line, it still cleared lines whenever it could: again tying back to the height multiplier. Clearing a line does exactly what it says: removing an entire row of blocks — and removing that many blocks does a huge blow to the height multiplier.
• The fitness function was really kind of screwed up. By the time your AI’s can get a few thousand lines on a tiny 8-column board, the only thing that causes the AI to die is a bad sequence of hard-to-place S and Z blocks — and in any random number generator you’ll eventually get a series of unlucky blocks. So at later generations, simply simulating a 8 column tetris was fairly bad at separating the very good AI’s from the excellent AI’s.
• Selection was also a bit screwed up. After ten generations, the entire population had more or less the same values, with only some minor variations — a bit ironic since this situation was exactly the situation the tournament selection algorithm was supposed to prevent.
• Although a genetic algorithm can converge on a local optimum fairly quickly, producing a decent solution, it is very hard for it to achieve a global optimum — the best possible solution. You might have a situation where mutating any one value seriously harms the candidate, but mutating two or more values simultaneously in a certain way makes it better. This is a drawback for genetic algorithms in general.
So that’s all I have to say. I’m fairly new to genetic algorithms, so I may have botched one or more parts of the algorithm. I’d love to know what I did wrong and how I should’ve done better.
Further discussion on Reddit and in the comments below!