Red herrings in brain imaging
Neuroscientists do massive numbers of comparisons regularly. They often perform fMRI studies, where a three-dimensional image of the brain is taken before and after the subject performs some task. The images show blood flow in the brain, revealing which parts of the brain are most active when a person performs different tasks.
神经学家在研究过程中做了大量的对比。例如fMRI研究中，在测试者实施某项活动前后对其脑部进行三维造影。影像显示了大脑血液的流动，揭示了人类在完成不同任务的过程中大脑哪一部分最活跃。 Continue reading 【Translation】Red herrings in brain imaging
Taking up arms against the base rate fallacy
You don’t have to be performing advanced cancer research or early cancer screenings to run into the base rate fallacy. What if you’re doing social research? You’d like to survey Americans to find out how often they use guns in self-defense. Gun control arguments, after all, center on the right to self-defense, so it’s important to determine whether guns are commonly used for defense and whether that use outweighs the downsides, such as homicides.
更正基本利率谬误并不一定要采取更高级的癌症研究或者早期癌症检查。如果你做社会调查结果会如何呢？针对美国人多久会使用枪支来自卫你可能会进行一项民间调查。对于枪支管理的辩论最终会落到行使自卫的权利上，所以，判断枪支是否普遍用来自卫，是否自卫理由下的使用优点大于缺点，比如用来杀人犯罪，这一点是很重要的。 Continue reading 【Translation】Taking up arms against the base rate fallacy
The p value and the base rate fallacy
You’ve already seen that p values are hard to interpret. Getting a statistically insignificant result doesn’t mean there’s no difference. What about getting a significant result?
Let’s try an example. Suppose I am testing a hundred potential cancer medications. Only ten of these drugs actually work, but I don’t know which; I must perform experiments to find them. In these experiments, I’ll look for p<0.05 gains over a placebo, demonstrating that the drug has a significant benefit.
我们用一个例子来解释。假设我测试100种可能的抗癌药品。实际上只有10种有效果，但是我们并不知道是哪10种；我必须用过试验来确定。在这些试验中，我认为与安慰剂对比的效果满足p<0.05的药品，对抗癌有显著疗效。 Continue reading 【Translation】The p value and the base rate fallacy
Pseudoreplication: choose your data wisely
Many studies strive to collect more data through replication: by repeating their measurements with additional patients or samples, they can be more certain of their numbers and discover subtle relationships that aren’t obvious at first glance. We’ve seen the value of additional data for improving statistical power and detecting small differences. But what exactly counts as a replication?
很多研究都希望通过复制收集更多的数据：通过对增加的患者或者样本重复测验，可以更加确定计算的结果，发现起初并不怎么明显的微妙关系。我们已经了解增加数据对提高统计功效和鉴别细微差异有促进作用。但是究竟什么才算是复制呢？ Continue reading 【Translation】Pseudoreplication: choose your data wisely
The power of being underpowered
After hearing all this, you might think calculations of statistical power are essential to medical trials. A scientist might want to know how many patients are needed to test if a new medication improves survival by more than 10%, and a quick calculation of statistical power would provide the answer. Scientists are usually satisfied when the statistical power is 0.8 or higher, corresponding to an 80% chance of concluding there’s a real effect.
讲到这里，你可能觉得对于医学试验来讲，计算统计功效是十分必要的。科学家也想知道到底需要多少受试患者才能推断一款新药能够有效提升10%的治愈率，快速计算统计功效就能给出答案。科学家通常采用等于或者大于0.8的统计功效，即有80%的可能判定试验药品有实际作用。 Continue reading 【Translation】The power of being underpowered
Statistical power and underpowered statistics
We’ve seen that it’s possible to miss a real effect simply by not taking enough data. In most cases, this is a problem: we might miss a viable medicine or fail to notice an important side-effect. How do we know how much data to collect?
我们发现，有可能因为数据量不够充分而导致对真实效果的误判。很多研究中都存在这样的问题：我们可能否定了一款实际有效的药品，或者没能检测出药品的副作用。那么究竟收集多少数据才能避免这些问题的发生呢？ Continue reading 【Translation】Statistical power and underpowered statistics
An introduction to data analysis
Much of experimental science comes down to measuring changes. Does one medicine work better than another? Do cells with one version of a gene synthesize more of an enzyme than cells with another version? Does one kind of signal processing algorithm detect pulsars better than another? Is one catalyst more effective at speeding a chemical reaction than another?
绝大多数科学实验的最终目的都是在测量变化。一种药是否比另一种更有效果？细胞拥有甲型基因比拥有乙型基因的能够合成更多的酶？是否这种信号处理的算法在探测脉冲星上更有优势？这种催化剂是否在加速化学反应上更有效？ Continue reading 【Translation】Statistics Done Wrong_An introduction to data analysis
In the final chapter of his famous book How to Lie with Statistics, Darrell Huff tells us that “anything smacking of the medical profession” or published by scientific laboratories and universities is worthy of our trust – not unconditional trust, but certainly more trust than we’d afford the media or shifty politicians. After all, Huff filled an entire book with the misleading statistical trickery used in politics and the media, but few people complain about statistics done by trained professional scientists. Scientists seek understanding, not ammunition to use against political opponents.
在Darrell Huff那本著名的《如何用统计学说谎》的最后一章，他提到我们应该相信“医药专业中令人咋舌的发现”，或者科研实验室和大学发表的期刊论文——虽然还不到无条件相信的地步，但至少比对于媒体和左右摇摆的政客的信任要更多一些。毕竟，Huff的整本书里都在讨论政客和媒体是如何通过玩弄统计学的把戏来诱导大众的，然而很少有人指出那些训练有素的科学家在统计学上所犯的错误。科学家通过统计方法来寻求理解，而不是用来玩政治对垒游戏。 Continue reading 【Translation】Statistics Done Wrong_Introduction
I’ve been practicing conda and this time I’m doing it on a Raspberrypi. Follow a simple guide and install miniconda in the Raspbian system.
It would be easy access to the pi after enabling SSH to the Respberrypi, so check the IP address first. In the respberrypi desktop, open a terminal and with “sudo ifconfig”.
The most common environment, if not a standard of doing data analysis, is Anaconda and Jupiter Notebook. In one word, Anaconda is a management tool of packages and environment, and Jupiter Notebook is a web-based integration tool of code, plots and markdowns. This post introduces the basic concept and commands of Anaconda.
Continue reading Anaconda Quick Guide