【Translation】What have we wrought?

What have we wrought?


I’ve painted a grim picture. But anyone can pick out small details in published studies and produce a tremendous list of errors. Do these problems matter?


Well, yes. I wouldn’t have written this otherwise.


John Ioannidis’s famous article “Why Most Published Research Findings are False”31 was grounded in mathematical concerns rather than an empirical test of research results. If most research articles have poor statistical power – and they do – while researchers have the freedom to choose among multitudes of analyses methods to get favorable results – and they do – when most tested hypotheses are false and most true hypotheses correspond to very small effects, we are mathematically determined to get a multitude of false positives.

John Ioannidis那篇有影响的文章“为什么大多数发表的研究发现是错误的”是基于数学方面的考虑,而非对研究结果的经验测试。如果大多数论文都不具备统计功效——事实也确实是这样——因为研究者有选择有利于得到期望结果的分析方法的自由——事实上他们确实是这样做的——当大多数假设检验其实是错误的,大多数真的假设对应的效应规模又太小,我们很容易从数学上得出大量的假阳性结果。

But if you want empiricism, you can have it, courtesy of John Ioannidis and Jonathan Schoenfeld. They studied the question “Is everything we eat associated with cancer?”51[1] After choosing fifty common ingredients out of a cookbook, they set out to find studies linking them to cancer rates – and found 216 studies on forty different ingredients. Of course, most of the studies disagreed with each other. Most ingredients had multiple studies claiming they increased and decreased the risk of getting cancer. Most of the statistical evidence was weak, and meta-analyses usually showed much smaller effects on cancer rates than the original studies.

但如果你想遵循经验主义,你也可以这样做,但是请尊重John Ioannidis和Jonathan Schoenfeld的工作。他们研究“是否所有我们摄入的东西都会导致癌症?”这个问题。他们从一本食谱中选出50种常见的配料,然后寻找与之相关的致癌率的研究——结果发现关于40种配料有216个研究。当然,大多数研究结果是相互矛盾的。针对多数配料都得出过促进或者抑制患癌症的风险。大多数研究的统计证据都很弱,而元分析得出的在致癌率的效应规模比原始研究要小更多。

Of course, being contradicted by follow-up studies and meta-analyses doesn’t prevent a paper from being cited as though it were true. Even effects which have been contradicted by massive follow-up trials with unequivocal results are frequently cited five or ten years later, with scientists apparently not noticing that the results are false.55 Of course, new findings get widely publicized in the press, while contradictions and corrections are hardly ever mentioned.23 You can hardly blame the scientists for not keeping up.


Let’s not forget the merely biased results. Poor reporting standards in medical journals mean studies testing new treatments for schizophrenia can neglect to include the scale they used to evaluate symptoms – a handy source of bias, as trials using unpublished scales tend to produce better results than those using previously validated tests.40 Other medical studies simply omit particular results if they’re not favorable or interesting, biasing subsequent meta-analyses to only include positive results. A third of meta-analyses are estimated to suffer from this problem.34


Another review compared meta-analyses to subsequent large randomized controlled trials, considered the gold standard in medicine. In over a third of cases, the randomized trial’s outcome did not correspond well to the meta-analysis.39 Other comparisons of meta-analyses to subsequent research found that most results were inflated, with perhaps a fifth representing false positives.45


Let’s not forget the multitude of physical science papers which misuse confidence intervals.37 Or the peer-reviewed psychology paper allegedly providing evidence for psychic powers, on the basis of uncontrolled multiple comparisons in exploratory studies.58 Unsurprisingly, results failed to be replicated – by scientists who appear not to have calculated the statistical power of their tests.20


We have a problem. Let’s work on fixing it.