On Multiple Hypothesis Testing and the Bonferroni Correction

When you’re doing a statistical analysis, it’s easy to run into the multiple comparisons problem.

Imagine you’re analyzing a dataset. You perform a bunch of statistical tests, and one day you get a p-value of 0.02. This must be significant, right? Not so fast! If you tried a lot of tests, then you’ve fallen into the multiple comparisons fallacy — the more tests you do, the higher chance you get a p-value < 0.05 by pure chance.

Here’s an xkcd comic that illustrates this:

significant.png

They conducted 20 experiments and got a p-value < 0.05 on one of the experiments, thus concluding that green jelly beans cause acne. Later, other researchers will have trouble replicating their results — I wonder why?

What should they have done differently? Well, if they knew about the Bonferroni Correction, they would have divided the p-value 0.05 by the number of experiments, 20. Then, only a p-value smaller than 0.0025 is a truly significant correlation between jelly beans and acne.

Let’s dive in to explain why this division makes sense.

Šidák Correction

Time for some basic probability. What’s the chance that the scenario in the xkcd comic would happen? In other words, if we conduct 20 experiments, each with probability 0.05 of producing a significant p-value, then how likely will at least one of the experiments produce a significant p-value? Assume all the experiments are independent.

The probability of an experiment not being significant is 1 - 0.05, so the probability of all 20 experiments not being significant is (1-0.05)^{20}. Therefore the probability of at least 1 of 20 experiments being significant is 1 - (1-0.05)^{20} = 0.64. Not too surprising now, isn’t it?

We want the probability of accidentally getting a significant p-value by chance to be 0.05, not 0.64 — the definition of p-value. So flip this around — we need to find an adjusted p-value p_{adj} to give an overall p-value 0.05:

1 - (1 - p_{adj})^{20} = 0.05

Solving for p_{adj}:

p_{adj} = 1 - 0.95^{1/20} \approx 0.00256

Okay, this seems reasonably close to 0.0025. In general, if the overall p-value is p and we are correcting for N comparisons, then

p_{adj} = 1 - (1 - p)^{1/N}

This is known as the Šidák Correction in literature.

Bonferroni Correction

Šidák’s method works great, but eventually people started complaining that Šidák’s name had too many diacritics and looked for something simpler (also, it used to be difficult to compute nth roots back when they didn’t have computers). How can we approximate this formula?

Approximate? Use Taylor series, of course!

Assume N is constant, and define:

f(p) = 1 - (1-p)^{1/N}

We take the first two terms of the Taylor series of f(p) centered at 0:

f(p) = f(0) + f'(0)p + O(p^2)

Now f(0) = 0 and f'(p) = \frac{1}{N} (1-p)^{-(N-1)/N} so f'(0) = \frac1N. Therefore,

f(p) = p_{adj} \approx \frac{p}{N}.

That’s the derivation for the Bonferroni Correction.

Since we only took the first two terms of the Taylor series, this produces a p_{adj} that’s slightly lower than necessary.

In the real world, p is close to zero, so in practice it makes little difference whether we use the exact Šidák Correction or the Bonferroni approximation.

That’s it for now. Next time you do multiple comparisons, just remember to divide your p-value by N. Now you know why.

One thought on “On Multiple Hypothesis Testing and the Bonferroni Correction

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s