When you’re doing a statistical analysis, it’s easy to run into the multiple comparisons problem.
Imagine you’re analyzing a dataset. You perform a bunch of statistical tests, and one day you get a p-value of 0.02. This must be significant, right? Not so fast! If you tried a lot of tests, then you’ve fallen into the multiple comparisons fallacy — the more tests you do, the higher chance you get a p-value < 0.05 by pure chance.
Here’s an xkcd comic that illustrates this:
They conducted 20 experiments and got a p-value < 0.05 on one of the experiments, thus concluding that green jelly beans cause acne. Later, other researchers will have trouble replicating their results — I wonder why?
What should they have done differently? Well, if they knew about the Bonferroni Correction, they would have divided the p-value 0.05 by the number of experiments, 20. Then, only a p-value smaller than 0.0025 is a truly significant correlation between jelly beans and acne.
Let’s dive in to explain why this division makes sense.
Time for some basic probability. What’s the chance that the scenario in the xkcd comic would happen? In other words, if we conduct 20 experiments, each with probability 0.05 of producing a significant p-value, then how likely will at least one of the experiments produce a significant p-value? Assume all the experiments are independent.
The probability of an experiment not being significant is , so the probability of all 20 experiments not being significant is . Therefore the probability of at least 1 of 20 experiments being significant is . Not too surprising now, isn’t it?
We want the probability of accidentally getting a significant p-value by chance to be 0.05, not 0.64 — the definition of p-value. So flip this around — we need to find an adjusted p-value to give an overall p-value 0.05:
Solving for :
Okay, this seems reasonably close to 0.0025. In general, if the overall p-value is and we are correcting for comparisons, then
This is known as the Šidák Correction in literature.
Šidák’s method works great, but eventually people started complaining that Šidák’s name had too many diacritics and looked for something simpler (also, it used to be difficult to compute nth roots back when they didn’t have computers). How can we approximate this formula?
Approximate? Use Taylor series, of course!
Assume is constant, and define:
We take the first two terms of the Taylor series of centered at 0:
Now and so . Therefore,
That’s the derivation for the Bonferroni Correction.
Since we only took the first two terms of the Taylor series, this produces a that’s slightly lower than necessary.
In the real world, is close to zero, so in practice it makes little difference whether we use the exact Šidák Correction or the Bonferroni approximation.
That’s it for now. Next time you do multiple comparisons, just remember to divide your p-value by . Now you know why.