Many thanks to Javier Benitez for starting this conversation on Twitter, and for pointing me to even more reading on the subject. Once again, an excellent article that has problematic fine print. In particular, this statement (and a couple of variations on it):

If you accept the null hypothesis because the nullPvalue exceeds 0.05 and the power of your test is 90%, the chance you are in error (the chance that your finding is a false negative) is 10%. No! If the null hypothesis is false and you accept it, the chance you are in error is 100 %, not 10 %. Conversely, if the null hypothesis is true and you accept it, the chance you are in error is 0 %. The 10 % refers only to how often you would be in error over very many uses of the test across different studies when the particular alternative used to compute power is correctandall other assumptions used for the test are correct in all the studies. It does not refer to your single use of the test or your error rate under any alternative effect size other than the one used to compute power.

This is a statement that I’d file under “true, but not useful.” It’s very much the same as saying “No, the sun doesn’t rise in the East and sink in the West, the motion of the Earth creates an illusion that the sun does those things.” For most purposes, from the point of view of an Earthbound observer, the idea that the sun goes up and then goes down is sufficiently close to the truth that there’s not much to be gained by arguing about it.

I therefore submit the following mathematical argument invokes a certain ‘quantum physics’ mindset in order to propose that, given a 90% confidence interval, the probability that your CI contains the true mean is (in fact) 90%:

Let there be a population where a certain characteristic in the population exists, and can be sampled, where the sampled value can be selected exactly once for any given member of the population, that the value on sampling can meaningfully be expressed as a real number, and where it is possible to select members of the population for sampling at random.

Let there be a set that contains all of the possible samples that could be taken from that population, for all sample sizes up to PopulationSize -1 (were we to actually sample everyone in the population, we’d have ‘the truth’, and no need for statistics). This set will be very very large in most normal cases — for a population size of 1000, the number of possible 5-sample studies is (1000*999*998*997*996) / (5*4*3*2), which is a Really Quite A Big Number.

The 90% confidence interval means that 90% of the members of the set have a confidence interval that contains the population mean — this much, we all agree on. If I were to select a study at random from the set of studies, I would have a 90% chance of selecting a study where the confidence interval contains the population mean. (Still in agreement.)

It is true to say (as quoted above) that given that the population has a mean value for this property, that any individual statement about that mean value will either be true or false. Therefore, the absolute truth of any individual statement will be either 100% or 0%.

However, consider drawing a card at random from a deck of face-down cards. The chances that you’ll draw a diamond (in one traditional deck as prescribed by Hoyle, with all cards available to draw except the jokers) is 1 in 4. If you take a card from the deck and don’t turn it over or look at it, then it is true, in the strictest sense, that probability is no longer at work. It’s either a diamond or it’s not. If you haven’t looked at it yet, then your subjective experience of that card is that it has 1 chance in 4 of being a diamond.

We could say that the card you haven’t looked at yet is in a ‘superposition of states’, equivalent to Schroedinger’s Theorem. Until we look at the card, our experience is that it’s a kind of blur of all the cards in the deck. When we look at it, it stops being in a superposition of states, and has one state.

The same principle can be applied to making confidence statements about statistical studies. In effect, we have drawn a card from the deck, but we haven’t been allowed to turn it over (to know the truth about it). While the truth of the population remains unknown, the accuracy of the study is in a superposition of states (shared among all the other studies of the same size that could possibly have been conducted).

If our study has a 90% confidence interval, then it’s effectively a card we don’t get to see, drawn from a deck where 90% of the cards are diamonds, and 10% of the cards are clubs. The odds that we were going to get a diamond were 90%. If we accept that we have not observed the truth of the study until we know the truth about the population, then Schroedinger’s principles apply, and we have a 90% chance of living in a universe where the population mean is within the confidence interval, and a 10% chance of living in a universe where the population mean is outside the confidence interval.

(Given that the statement is either ‘definitely true’ or ‘definitely false’, we can hypothesize parallel universes, categorize them according to the truthness or falseness of the study question, and then calculate our probability of selecting a given universe.)

Alternatively, we can be a little bit less precious about the whole thing, and say “Yeah, there’s a 90% chance that the population mean is within the 90% confidence interval.”