【Translation】Hiding the data 

Hiding the data 

隐藏数据

“Given enough eyeballs, all bugs are shallow.”

—Eric S. Raymond

“只要有足够的人去审查,所有谬误都会搁浅。”

——Eric S. Raymond

We’ve talked about the common mistakes made by scientists, and how the best way to spot them is a bit of outside scrutiny. Peer review provides some of this scrutiny, but a peer reviewer doesn’t have the time to extensively re-analyze data and read code for typos – reviewers can only check that the methodology makes good sense. Sometimes they spot obvious errors, but subtle problems are usually missed.52

我们谈论了科学家常犯的错误,以及识别他们最好的办法就是旁观者的审视。同行评审提供部分审查,但是同行评审并没有花时间去广泛地重新分析数据,读代码纠错——评审只是在检查作者使用的方法是否符合逻辑。有时也会发现一些明显的错误,但是一些轻微的问题就被忽略不见了。

This is why many journals and professional societies require researchers to make their data available to other scientists on request. Full datasets are usually too large to print in the pages of a journal, so authors report their results and send the complete data to other scientists if they ask for a copy. Perhaps they will find an error or a pattern the original scientists missed.

这也是为什么很多期刊和专业圈子要求研究人员对其他科学家公开自己的数据。完整的数据集通常非常占版面,所以作者只发表自己的结论,如果其他人需要数据再单独发给他们完整的数据。也许他们能发现作者所犯的错误或者其他规律。

Or so it goes in theory. In 2005, Jelte Wicherts and colleagues at the University of Amsterdam decided to analyze every recent article in several prominent journals of the American Psychological Association to learn about their statistical methods. They chose the APA partly because it requires authors to agree to share their data with other psychologists seeking to verify their claims.

理论上是这么讲的。2005年,阿姆斯特丹大学的Jelte Wicherts 和她的同事决定分析近期一部分美国心理学会(APA)最有影响力期刊的论文,来学习其中的统计办法。他们选择APA部分是因为该协会要求作者同意与塔器科学家共享数据来验证他们的发现。

Of the 249 studies they sought data for, they had only received data for 64 six months later. Almost three quarters of study authors never sent their data.61

在收集的249份研究数据的过程中,历时六个月只收到了64份数据。几乎四分之三的研究作者并没有共享他们的数据。

Of course, scientists are busy people, and perhaps they simply didn’t have the time to compile their datasets, produce documents describing what each variable means and how it was measured, and so on.

当然,科研人员都是大忙人,也许他们只是没有时间处理数据集合、整理文档来解释每一种变量的含义,以及他们是如何进行测量的等等。

Wicherts and his colleagues decided they’d test this. They trawled through all the studies looking for common errors which could be spotted by reading the paper, such as inconsistent statistical results, misuse of various statistical tests, and ordinary typos. At least half of the papers had an error, usually minor, but 15% reported at least one statistically significant result which was only significant because of an error.

Wicherts和他的同事决定测试一下。他们搜罗了所有研究,其中可以通过阅读就能发现常见错误的论文,例如前后不一致的统计结果,使用错误的统计检验以及普通的拼写错误。发现至少有一半论文都有错误,通常是小毛病,但是15%的论文都有至少一个统计显著结果的基于错误的数据得出的。

Next, they looked for a correlation between these errors and an unwillingness to share data. There was a clear relationship. Authors who refused to share their data were more likely to have committed an error in their paper, and their statistical evidence tended to be weaker.60 Because most authors refused to share their data, Wicherts could not dig for deeper statistical errors, and many more may be lurking.

此外,他们研究这些问题和研究人员不愿分享数据之间的关系。关系是显而易见的。不愿意分享数据的人在论文中出错的可能性更高,他们的统计证据的可信度更低。因为大部分作者拒绝分享数据,Wicherts无法得出更深层次的统计误差,现实中还有更多潜伏的问题。

This is certainly not proof that authors hid their data out of fear their errors may be uncovered, or even that the authors knew about the errors at all. Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing “look over there.”[1]

这当然不能证明作者隐藏数据是害怕被发现错误,或者作者本身知道自己的问题。相关性不代表因果性,但是这确实是在眉宇间暗中提醒我们“注意看那边儿”。

 

Just leave out the details

忽略细节

Nitpicking statisticians getting you down by pointing out flaws in your paper? There’s one clear solution: don’t publish as much detail! They can’t find the errors if you don’t say how you evaluated your data.

挑剔的统计学家找你论文里的茬儿让你很郁闷?有个好办法可以避免这个问题:不要把所有研究细节全盘托出!你不告诉他们你是如何分析你的数据,他们就没有办法找哪里有问题。

I don’t mean to seriously suggest that evil scientists do this intentionally, although perhaps some do. More frequently, details are left out because authors simply forgot to include them, or because journal space limits force their omission.

我并非故意让科学家们去挑刺儿,虽然有些人是乐意这么做。更多情况下,论文中细节讨论缺失是因为作者忽视了而已,或者应为期刊版面限制而迫使他们删除了。

It’s possible to evaluate studies to see what they left out. Scientists leading medical trials are required to provide detailed study plans to ethical review boards before starting a trial, so one group of researchers obtained a collection of these plans from a review board. The plans specify which outcomes the study will measure: for instance, a study might monitor various symptoms to see if any are influenced by the treatment. The researchers then found the published results of these studies and looked for how well these outcomes were reported.

研究作者们未提及的内容是有可能的。进行医学试验的科学家们要求在试验前向道德评估委员会提供详细的研究计划,所以研究人员可以从一位评估人员那里获得关于该试验的一系列计划要求。计划明确指出研究需要测量的结果:例如,一项研究监测各种症状是否受到治疗方案的影响。研究人员就可以从已经出版的试验结果中寻找这些结果是如何得出的。

Roughly half of the outcomes never appeared in the scientific journal papers at all. Many of these were statistically insignificant results which were swept under the rug.[2] Another large chunk of results were not reported in sufficient detail for scientists to use the results for further meta-analysis.14

大约一半的结果从未在科学期刊上面发表。很多是统计非显著结果,就像地毯地下的灰尘一样被扫干净了。也有大部分公布了但是没有足够细节介绍的论文,科学家没法用它们来进行进一步的元分析。

Other reviews have found similar problems. A review of medical trials found that most studies omit important methodological details, such as stopping rules and power calculations, with studies in small specialist journals faring worse than those in large general medicine journals.29

其他综述也发现了类似的问题。一项关于医学试验的综述发现大多数研究省略了重要的方法论细节,例如终止规则和功效计算,在小而专的领域里这样的情况比大众医学期刊表现更严重。

Medical journals have begun to combat this problem with standards for reporting of results, such as the CONSORT checklist. Authors are required to follow the checklist’s requirements before submitting their studies, and editors check to make sure all relevant details are included. The checklist seems to work; studies published in journals which follow the guidelines tend to report more essential detail, although not all of it.46 Unfortunately the standards are inconsistently applied and studies often slip through with missing details nonetheless.42 Journal editors will need to make a greater effort to enforce reporting standards.

为矫正这些问题,医学期刊开始设立报告的标准,例如CONSORT检查清单。该清单要求作者们在提交论文前自检,同时编辑们也会检查是否所有要求的内容都在列。这看上去还是有些作用的;执行指导的期刊文献中包含了更多的关键细节,虽然也不是全部包含。但不幸的是,执行这些标准并不严谨,依旧有些漏网之鱼没有包括研究细节。这就要求编辑们做更多努力来维护标准的执行。

We see that published papers aren’t faring very well. What about unpublished studies?

我们看到已经出版的文献并非表现良好。那么那些未出版的研究呢?

 

Science in a filing cabinet

在文件柜里的研究

Earlier we saw the impact of multiple comparisons and truth inflation on study results. These problems arise when studies make numerous comparisons with low statistical power, giving a high rate of false positives and inflated estimates of effect sizes, and they appear everywhere in published research.

早前我们介绍了多重比较和真值膨胀对研究结果的影响。这些问题发生在研究人员没有足够的统计功效下进行了大量的比较试验,从而导致过高的假阳性概率,夸张估计了效应规模。这样的情况在出版的研究中层出不穷。

But not every study is published. We only ever see a fraction of medical research, for instance, because few scientists bother publishing “We tried this medicine and it didn’t seem to work.”

但并非所有的研究都有机会被发表。我们只看到大量医学研究中的沧海一粟。毕竟几乎没有科学家会劳心去发表一篇文章说“我们测试了这个药品但是它并不起作用。”

Consider an example: studies of the tumor suppressor protein TP53 and its effect on head and neck cancer. A number of studies suggested that measurements of TP53 could be used to predict cancer mortality rates, since it serves to regulate cell growth and development and hence must function correctly to prevent cancer. When all 18 published studies on TP53 and cancer were analyzed together, the result was a highly statistically significant correlation: TP53 could clearly be measured to tell how likely a tumor is to kill you.

考虑这样一种情况:研究肿瘤抑制蛋白TP53在头颈癌症中的作用。很多研究发现TP53的测量可以用来预测癌症死亡率,因为该蛋白的作用是规范细胞生长和进化,所以TP53的正常才能预防癌症。当有同时对18份关于TP53的研究进行分析的时候,结果有很高的统计显著关系:可以用TP53来评估肿瘤致死的概率。

But then suppose we dig up unpublished results on TP53: data that had been mentioned in other studies but not published or analyzed. Add this data to the mix and the statistically significant effect vanishes.36 After all, few authors bothered to publish data showing no correlation, so the meta-analysis could only use a biased sample.

如果我们深挖一下那些关于TP53未发表的研究:那些在其他研究里提到但是并没有发表或分析的数据。如果将这些数据总结到一起,统计显著的结果就不成立了。毕竟,很少有作者会去发表那些显示无关的数据,所以元分析所应用的数据是存在样本偏见的。

A similar study looked at reboxetine, an antidepressant sold by Pfizer. Several published studies have suggested that it is effective compared to placebo, leading several European countries to approve it for prescription to depressed patients. The German Institute for Quality and Efficiency in Health Care, responsible for assessing medical treatments, managed to get unpublished trial data from Pfizer – three times more data than had ever been published – and carefully analyzed it. The result: reboxetine is not effective. Pfizer had only convinced the public that it’s effective by neglecting to mention the studies proving it isn’t.18

另一类似研究是关于Pfizer公司的抗抑郁药品reboxetine。一些已发表的论文指出该药有效,因此很多欧洲国家允许给抑郁患者开这种处方。德国卫生健康质检所负责评估该药品,他们同时收集到Pfizer一些未发表的试验数据,认真分析后发现:reboxetine是无效的。Pfizer公司没有公布那些证明药品无效的试验,结果是严重偏袒的。

This problem is commonly known as publication bias or the file-drawer problem: many studies sit in a file drawer for years, never published, despite the valuable data they could contribute.

这是我们常见的出版偏见或者抽屉里的文件问题:那些被丢在抽屉里未发表的研究,即使没有太多有价值的数据但依旧值得研究。

The problem isn’t simply the bias on published results. Unpublished studies lead to a duplication of effort – if other scientists don’t know you’ve done a study, they may well do it again, wasting money and effort.

问题不仅仅是发表论文存在偏见这么简单。未发表的研究还会导致其他研究人员进行不必要的重复工作,他们如果不知道前人已经这么做过,那么重复试验只会劳民伤财。

Regulators and scientific journals have attempted to halt this problem. The Food and Drug Administration requires certain kinds of clinical trials to be registered through their website ClinicalTrials.gov before the trials begin, and requires the publication of results within a year of the end of the trial. Similarly, the International Committee of Medical Journal Editors announced in 2005 that they would not publish studies which had not been pre-registered.

监管机构和科学文献计划杜绝这种问题。食品与药品管理局(FDA)要求部分临床试验在开始之前到他们的官网ClinicalTrials.gov上注册,同时要更新实验结束后一年内论文出版情况的结果。同时,国际医学期刊编辑文员会宣布,2005年以后他们将不会再发表未注册过的研究。

Unfortunately, a review of 738 registered clinical trials found that only 22% met the legal requirement to publish.47 The FDA has not fined any drug companies for noncompliance, and journals have not consistently enforced the requirement to register trials. Most studies simply vanish.

不幸的是,一项关于738份注册临床试验的研究发现,只有22%满足发表的要求。FDA并没有对那些不合规的制药公司进行罚款,期刊文献也没有持续强制试验的注册要求。所以更多的研究还是消失在人们视线中了。