And now (thanks Richard Morey) I have a little bit more insight into why it is that statisticians don’t like associating a ‘probability about the true mean’ when they talk about confidence intervals.

If we were to do multiple studies (to ‘draw multiple cards from the deck’ as it were), then we will calculate multiple confidence intervals. Let’s imagine 2 studies that produce the following 90% confidence intervals:

1 to 4

2 to 3

If we say that each of those statements allows us to say “There is a 90% chance that the true mean is within the confidence interval” then something paradoxical happens: while the first study gives us 90% confidence (indicates a 90% probability) that the true mean is between 1 and 4, the second study gives us more than 90% confidence that the true mean is between 1 and 4. (If it’s probably between 2 and 3, it’s even more probably between 1 and 4.)

The apparent paradox is that the statement ‘there is a 90% chance that the true mean is between 1 and 4’ needs to coexist with the statement ‘there is a higher than 90% chance that the true mean is between 1 and 4.’ Since 90% can’t be higher than 90%, there’s a problem here.

The solution to this apparent paradox is to note that the different numbers came from different studies, and treat them accordingly. If we say ‘The first study indicates’ and then ‘The second study indicates’ then it’s clear that the two studies indicate different things. (If we combine the data, then we’ll end up saying that the odds that the true mean are between 2 and 3 are higher than 90% as well, so we can’t accurately state the combined result purely in terms of study number 2.)

Similarly, there’s an apparent paradox at stake if we get a number of confidence intervals that don’t overlap. It doesn’t make sense to say that there’s a 90% chance that the true mean is less than 10, a 90% chance that it’s between 10 and 20, and a 90% chance that it’s greater than 20. However, it’s a very plausible scenario (ie, if someone told me it happened, I’d be happy to believe them). Given that these probabilities are mutually exclusive, and they add up to 270%, we’ve broken a very important rule of probability (the total of all possible mutually-exclusive outcomes must add up to 100%).

So now we pay the price for cheating when we allowed for a superposition of states, and note that the ‘true probability’ regarding the mean is not the same as the ‘uncertain obeserver’s probability’ which we allowed ourselves to call ‘90%’ because we didn’t want to handle the truth.

What we can (and indeed should) do at this point is calculate a new ‘uncertain observer’s probability’ regarding the nature of the truth that is based on merging the studies together. This is what Bayesian methods are for — instead of treating each experiment as a unique, one-off event, we get tools for treating each experiment as a thing that happens in the context of other experiments.