【Translation】Researcher freedom: good vibrations?

Researcher freedom: good vibrations?

研究人员的自由:选择多真的好吗?

There’s a common misconception that statistics is boring and monotonous. Collect lots of data, plug the numbers into Excel or SPSS or R, and beat the software with a stick until it produces some colorful charts and graphs. Done! All the statistician must do is read off the results.

人们总是误以为统计是无聊和单调的。收集大量的数据,然后倒入Excel或者SPSS、R软件里,折磨机器好一阵子直到得出漂亮的图表。完美!所有统计学家必须要做的是有可读性的结论。

But one must choose which commands to use. Two researchers attempting to answer the same question may perform different statistical analyses entirely. There are many decisions to make:

但是研究人员必须注意使用哪一条命令。针对同一个问题,两个研究人员可能使用完全不同的方法。有太多可以做的决定:

  1. Which variables do I adjust for? In a medical trial, for instance, you might control for patient age, gender, weight, BMI, previous medical history, smoking, drug use, or for the results of medical tests done before the start of the study. Which of these factors are important, and which can be ignored?

我需要调整哪一个指标?比如在医学试验中,你想控制患者的年龄、性别、重量、BMI指数、医学病史、吸烟、用药史,或者在研究开始之前的医疗检验结果。那么哪些因素跟重要,哪些可以被忽略呢?

  1. Which cases do I exclude? If I’m testing diet plans, maybe I want to exclude test subjects who came down with uncontrollable diarrhea during the trial, since their results will be abnormal.

我需要排除哪些情况?如果我测试用餐计划,也许我认为应该排除那些在试验中突然出现急性腹泻的患者,因为他们的个例并非常态。

  1. What do I do with outliers? There will always be some results which are out of the ordinary, for reasons known or unknown, and I may want to exclude them or analyze them specially. Which cases count as outliers, and what do I do with them?

我如何处理异常点?总有一些结果是出乎意料的,因为知道或者不知道的原因,也许我认为应该排除他们或者单独分析这些异常。那么到底怎样才能不算作是异常点?我要如何处理他们?

  1. How do I define groups? For example, I may want to split patients into “overweight”, “normal”, and “underweight” groups. Where do I draw the lines? What do I do with a muscular bodybuilder whose BMI is in the “overweight” range?

我如何定义组?例如,我想把患者体重分为:“超重”、“正常”、“偏轻”三个组,我要如何界定这个范围?如果一个肌肉发达的健身者的BMI在“超重”范围,划分到超重是否合理?

  1. What about missing data? Perhaps I’m testing cancer remission rates with a new drug. I run the trial for five years, but some patients will have tumors reappear after six years, or eight years. My data does not include their recurrence. How do I account for this when measuring the effectiveness of the drug?

那些错过的数据?也许在我测试抗癌药的癌症缓解率时,我试验进行了五年,但是有些患者在第六年或者第八年又出现了肿瘤。我的数据并没有考虑这些情况。那我要如何解释这个药品的真正效应呢?

  1. How much data should I collect? Should I stop when I have a definitive result, or continue as planned until I’ve collected all the data?

我要收集多少数据?当我有可以下结论的数据就该停止收集吗?还是继续收集直到收集所有计划的数据量?

  1. How do I measure my outcomes? A medication could be evaluated with subjective patient surveys, medical test results, prevalence of a certain symptom, or measures such as duration of illness.

我如何评估试验结果?一项医疗的评估该凭借主观患者的调查、医疗试验结果、某一症状的流行,还是疾病的治愈周期?

Producing results can take hours of exploration and analysis to see which procedures are most appropriate. Papers usually explain the statistical analysis performed, but don’t always explain why the researchers chose one method over another, or explain what the results would be had the researchers chosen a different method. Researchers are free to choose whatever methods they feel appropriate – and while they may make the right choices, what would happen if they analyzed the data differently?

区分哪一种方案是最合适的,往往需要长时间的探索与分析。论文通常只是解释统计过程和结果分析,但是很少解释为什么选择这种统计方法,或者解释如果换一种方法得到的结果是什么。研究者有绝对的自由选择他们认为合适的方法,但是他们的选择可能是对的,但是如果换种分析方法结果会有多不同?

In simulations, it’s possible to get effect sizes different by a factor of two simply by adjusting for different variables, excluding different sets of cases, and handling outliers differently.30 The effect size is that all-important number which tells you how much of a difference your medication makes. So apparently, being free to analyze how you want gives you enormous control over your results!

在仿真过程中,通过调整不同的变量、排除一些特殊案例、对异常做特殊处理,我们得到关于一个因素两倍于效应量的差别是有可能的。效应规模是断定你医疗方案区别最重要的指标。所以很显然,你可以根据你的结论来自由选择分析过程。

The most concerning consequence of this statistical freedom is that researchers may choose the statistical analysis most favorable to them, arbitrarily producing statistically significant results by playing with the data until something emerges. Simulation suggests that false positive rates can jump to over 50% for a given dataset just by letting researchers try different statistical analyses until one works.53

这种统计方法上的自由造成的后果是研究人员倾向于使用于自己有益的方法,玩数字游戏任意制造统计差异性的结果。仿真结果指出,如果让研究人员尝试不同的统计分析方法,假阳性的概率能超过50%。

Medical researchers have devised ways of preventing this. Researchers are often required to draft a clinical trial protocol, explaining how the data will be collected and analyzed. Since the protocol is drafted before the researchers see any data, they can’t possibly craft their analysis to be most favorable to them. Unfortunately, many studies depart from their protocols and perform different analysis, allowing for researcher bias to creep in.1514 Many other scientific fields have no protocol publication requirement at all.

医药研究人员有为此想出了很多办法。通常在试验之前,研究人员需要起草一份临床试验规范,解释如何收集数据和分析数据。因为规范的制定在收集数据前,所以他们没有可能更改方案。不幸的是,很多研究实际上偏离了最初的规范使用了不同的方案,从而间接引入和研究人员的偏见。还有很多科学研究压根就没有规范设计的要求。

The proliferation of statistical techniques has given us many useful tools, but it seems they have been put to use as blunt objects. One must simply beat the data until it confesses.

统计技术的普及给我们提供了太多有用的工具,但是这些工具又好像是任我们宰割的笨蛋。我们只要严刑拷打直到让数据给出我们想要的答案。

Source:https://www.statisticsdonewrong.com/freedom.html