On Multiple Hypothesis Testing and the Bonferroni Correction

When you’re doing a statistical analysis, it’s easy to run into the multiple comparisons problem.

Imagine you’re analyzing a dataset. You perform a bunch of statistical tests, and one day you get a p-value of 0.02. This must be significant, right? Not so fast! If you tried a lot of tests, then you’ve fallen into the multiple comparisons fallacy — the more tests you do, the higher chance you get a p-value < 0.05 by pure chance.

Here’s an xkcd comic that illustrates this:

They conducted 20 experiments and got a p-value < 0.05 on one of the experiments, thus concluding that green jelly beans cause acne. Later, other researchers will have trouble replicating their results — I wonder why?

What should they have done differently? Well, if they knew about the Bonferroni Correction, they would have divided the p-value 0.05 by the number of experiments, 20. Then, only a p-value smaller than 0.0025 is a truly significant correlation between jelly beans and acne.

Let’s dive in to explain why this division makes sense.

Šidák Correction

Time for some basic probability. What’s the chance that the scenario in the xkcd comic would happen? In other words, if we conduct 20 experiments, each with probability 0.05 of producing a significant p-value, then how likely will at least one of the experiments produce a significant p-value? Assume all the experiments are independent.

The probability of an experiment not being significant is $1 - 0.05$, so the probability of all 20 experiments not being significant is $(1-0.05)^{20}$. Therefore the probability of at least 1 of 20 experiments being significant is $1 - (1-0.05)^{20} = 0.64$. Not too surprising now, isn’t it?

We want the probability of accidentally getting a significant p-value by chance to be 0.05, not 0.64 — the definition of p-value. So flip this around — we need to find an adjusted p-value $p_{adj}$ to give an overall p-value 0.05:

$1 - (1 - p_{adj})^{20} = 0.05$

Solving for $p_{adj}$:

$p_{adj} = 1 - 0.95^{1/20} \approx 0.00256$

Okay, this seems reasonably close to 0.0025. In general, if the overall p-value is $p$ and we are correcting for $N$ comparisons, then

$p_{adj} = 1 - (1 - p)^{1/N}$

This is known as the Šidák Correction in literature.

Bonferroni Correction

Šidák’s method works great, but eventually people started complaining that Šidák’s name had too many diacritics and looked for something simpler (also, it used to be difficult to compute nth roots back when they didn’t have computers). How can we approximate this formula?

Approximate? Use Taylor series, of course!

Assume $N$ is constant, and define:

$f(p) = 1 - (1-p)^{1/N}$

We take the first two terms of the Taylor series of $f(p)$ centered at 0:

$f(p) = f(0) + f'(0)p + O(p^2)$

Now $f(0) = 0$ and $f'(p) = \frac{1}{N} (1-p)^{-(N-1)/N}$ so $f'(0) = \frac1N$. Therefore,

$f(p) = p_{adj} \approx \frac{p}{N}.$

That’s the derivation for the Bonferroni Correction.

Since we only took the first two terms of the Taylor series, this produces a $p_{adj}$ that’s slightly lower than necessary.

In the real world, $p$ is close to zero, so in practice it makes little difference whether we use the exact Šidák Correction or the Bonferroni approximation.

That’s it for now. Next time you do multiple comparisons, just remember to divide your p-value by $N$. Now you know why.