In other news: ESP doesn’t seem to exist after all
Last winter ESP made it into national news (and the Colbert report) because a study by a sane psychologist from a reputable institution was published in a mainstream psychology journal, and that study showed, using sound experimental procedure and standard statistical analysis, that a particular kind of ESP — namely, precognition: knowledge of something before it happens — existed, as demonstrated with experimental data. Definitely man bites dog kind of news.
But instead of a mass conversion of skeptics, this publication generated heated debates about the suitability of standard statistical analysis — the ubiquitous t-test — for hypothesis testing in psychology and social science. Why?
The facts first. So Daryl Bem reports on a series of experiments that normally test for some really well-known and simple psychological effects, but reversed in time. For example, people will tend to avoid negative stimuli, which can be useful in learning (the stick part of the carrot and stick). Or there is a known priming effect where you’re more likely to recall random words if you’ve seen them or practiced with them recently. But what if you recall first, then practice — will you be more likely to recall those words? If so, that’s evidence for precognition. And of course it would go completely against our understanding of causes and effects.
The first experiment in Bem’s paper actually did a strange thing. The experimenter told participants that they were there for a test of ESP. Then they sat in front of a computer that repeatedly showed them two curtains side by side and asked to guess — by clicking on it — which curtain contains a picture behind it. However, the choice was not made by the computer until after the click. And the choice was made randomly, using both the standard software pseudo-random number generators and a real hardware random number generator. The interesting findings: when the picture was a neutral one, subjects predictably didn’t do any better than chance at guessing where it would be. But when the picture was erotic, they guessed correctly about 53% of the time, which is significantly better than chance according to the kind of statistical analysis that psychologists and social scientists routinely use to report findings (the t-test). In fact, in 8 out of 9 experiments, Bem reports statistically significant results that seem to demonstrate precognition: the participants’ ability to guess correctly what will happen before it does.
How is that possible? There are at least the following ways:
1. The experiments were sloppy or badly designed, or fraudulent.
Actually, we have very little reason to believe this. Bem goes to great length to explain exactly the setup and everything seems completely legit: standard research picture database, standard computer-run experimental procedure, a hardware random number generator. The absence of any indication of poor or dishonest procedure and analysis is exactly why the journal published the paper in the first place. And it is actually a great victory for the peer-review process.
2. There is actually a precognition phenomenon, and the results are evidence for it.
It could be that we can actually “see” the future, at least better than chance, even when the outcome is governed by nothing but chance. After all, there are more things in heaven and earth than blah blah. But this explanation seems extremely unlikely, because there’s so much evidence that goes against it. The casino always wins in the long run — if we could guess random numbers 53% of the time, the house would lose and gambling would not be a lucrative business. Everything we know about physics says there isn’t an explanation for this time-traveling information: entropy is supposed to only increase. Everything we know about biology says there is no way: there isn’t a sensory organ for such time-traveling information. Plus, so many times so many people have attempted to show the existence of ESP and failed. So it seems that we should be very skeptical of embracing the parapsychological explanation immediately. What else could be going on?
3. There is something wrong with the statistical analysis that tells us the results are significant.
If the data are real but there is no underlying phenomenon, the analysis is the only thing left to question. In fact, Eric-Jan Wagenmakers and colleagues did exactly that in a paper that followed Bem’s in the same issue of the journal. They pointed out rightly that extraordinary claims require extraordinary evidence. So how good is Bem’s evidence? You can put a number on the strength of it. Using Bayesian statistics, figure out the likelihoods of both hypotheses (the null hypothesis that there’s no effect, and the alternative hypothesis that there is one) and take the ratio of those likelihoods. The number you get is a Bayes factor, and it tells you how strongly you should prefer the alternative hypothesis. At a factor of 2:1 the evidence is barely worth mentioning, or anecdotal. At a factor of 100:1, the evidence points decisively to the alternative hypothesis. Bem’s evidence? Barely worth mentioning. It could as well be evidence for the null as the alternative hypothesis. This was also corroborated this month by a meta-analysis by Jeffrey Rouder and Richard Morey that fixes some problems with Wagenmakers et al’s.
And so a rational person, having read Bem’s article, will revise their belief in methodology instead of revising their belief in psi. And she will balk at how easy it is apparently to produce evidence for an effect where there is none, where the null hypothesis is true. If the evidence isn’t strong at all but you get significance on the t-test, just how sucky is the t-test? And how many studies are published monthly in psychology and social science that use the t-test? Perhaps it will be fewer now that Bem’s controversy highlighted its problems. Although debate is still raging in psychology circles (see Bem‘s and Wagenmakers‘s homepages).
But it’s interesting too, how difficult apparently the correct statistical interpretation of data is. Even those diligently looking for Bayes factors don’t necessarily get it right.
Bem, Daryl J. (2011) Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect. Journal of Personality and Social Psychology 100:3, pp. 407–425
Rouder, Jeffrey N. & Morey, Richard D. (2011) A Bayes-Factor Meta Analysis of Bem’s ESP Claim. Psychonomics Bulletin & Review, published Online First on May 14, 2011.
Wagenmakers, Eric-Jan, Wetzels, Ruud, Borsboom, Denny & van der Maas, Han L. J. (2011) Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi: Comment on Bem (2011). Journal of Personality and Social Psychology 100:3, pp. 426–432