What if math contests were scored using Principal Component Analysis?

In many math competitions, all problems are weighted equally, even though the problems have very different difficulties. Sometimes, the harder problems are weighted more. But how should we assign weights to each problem?

Usually, the organizers make up weights based on how difficult they believe the problems are. However, they sometimes misjudge the difficulty of problems. Wouldn’t it be better if the weightings were determined from data?

pca.gif

Let’s try Principal Component Analysis!

Principal Component Analysis (PCA) is a statistical procedure that finds a transformation of the data that maximizes the variance. In our case, the first principal component gives a relative weighting of the problems that maximizes the variance of the total scores. This makes sense because we want to separate the good and bad students in a math contest.

IMO 2017 Data

The International Mathematics Olympiad (IMO) is an annual math competition for top high school students around the world. It consists of six problems, divided between two days: on each day, contestants are given 4.5 hours to solve three problems.

Here are the 2017 problems, if you want to try them.

3.pngAbove: Score distribution for IMO 2017

This year, 615 students wrote the IMO. Problems 1 and 4 were the easiest, with the majority of contestants receiving full scores. Problems 3 and 6 were the hardest: only 2 students solved the third problem. Problems 2 and 5 were somewhere in between.

This is a good dataset to play with, because the individual results show what each student scored for every problem.

Derivation of PCA for the 1-dimensional case

Let X be a matrix containing all the data, where each column represents one problem. There are 615 contestants and 6 problems so X has 615 rows and 6 columns.

We wish to find a weight vector \vec u \in \mathbb{R}^{6 \times 1} such that the variance of X \vec u is maximized. Of course, scaling up \vec u by a constant factor also increases the variance, so we need the constraint that | \vec u | = 1.

First, PCA requires that we center X so that the mean for each of the problems is 0, so we subtract each column by its mean. This transformation shifts the total score by a constant, and doesn’t affect the relative weights of the problems.

Now, X \vec u is a vector containing the total scores of all the contestants; its variance is the sum of squares of its elements, or | X \vec u |^2.

To maximize |X \vec u |^2 subject to |\vec u| = 1, we take the singular value decomposition of X = U \Sigma V^T. Then, the leftmost column of V (corresponding to the largest singular value) gives \vec u that maximizes | X \vec u|^2. This gives the first principal axis, and we are done.

Experiments

Running PCA on the IMO 2017 data produced interesting results. After re-scaling the weights so that the minimum possible score is 0 and the maximum possible score is 42 (to match IMO’s scoring), PCA recommends the following weights:

  • Problem 1: 9.15 points
  • Problem 2: 9.73 points
  • Problem 3: 0.15 points
  • Problem 4: 15.34 points
  • Problem 5: 5.59 points
  • Problem 6: 2.05 points

This is the weighting that produces the highest variance. That’s right, solving the hardest problem in the history of the IMO would get you a fraction of 1 point. P4 had the highest variance of the six problems, so PCA gave it the highest weight.

5.png

The scores and rankings produced by the PCA scheme are reasonably well-correlated with the original scores. Students that did well still did well, and students that did poorly still did poorly. The top students that solved the harder problems (2, 3, 5, 6) usually also solved the easier problems (1 and 2). The students that would be the unhappiest with this scheme are a small number of people who solved P3 or P6, but failed to solve P4.

Here’s a comparison of score distributions with the original and PCA scheme. There is a lot less separation between the best of the best students and the middle of the pack. It is easy to check that PCA does indeed produce higher variance than weighing all six problems equally.

4.png

Now, let me comment on the strange results.

It’s clearly absurd to give 0.15 points to the hardest problem on the IMO, and make P4, a much easier problem, be worth 100 times more. But it makes sense from PCA’s perspective. About 99% of the students scored zero on P3, so its variance is very low. Given that PCA has a limited amount of weight to “spend” to increase the total variance, it would be wasteful to use much of it on P3.

The PCA score distribution has less separation between the good students and the best students. However, by giving a lot of weight to P1 and P4, it clearly separates mediocre students that solve one problem from the ones who couldn’t solve anything at all.

In summary, scoring math contests using PCA doesn’t work very well. Although it maximizes overall variance, math contests are asymmetrical as we care about differentiating between the students on the top end of the spectrum.

Source Code

If you want to play with the data, I uploaded it as a Kaggle dataset.

The code for this analysis is available here.

Further discussion of this article on /r/math.