We see a lot of studies in the media and in our organizations in normal circumstances. We are seeing even more now with the pandemic’s effect reverberating across the globe. Policy makers, executives, leaders—people—are relying on some of these studies to make decisions about their personal and professional activities as we navigate our post-shutdown world.
Unfortunately, many of these studies will prove to be wrong in some way or another. In most cases this is not because of nefarious researchers, but that good science is hard. Good statistics and statistical modeling is also hard.
While we often hear about “making data-driven decisions,” we need statistics and models to interpret the data. These models have assumptions and judgement calls that change how we interpret a model and the model’s usefulness. Even experts can make mistakes in their modeling, and hastily done research can worsen these mistakes.
Here are three things you can do when you come across a study to help you evaluate its usefulness, and how much confidence to have in its results. Whether the field is medicine, public health, economics, business, and beyond, these three things apply.
1) Focus on the effect size, not “statistical significance”
The term “statistical significance” and its associated p-value—p < .05—are too often misused and misunderstood. This happens so often that in 2016 the American Statistical Association issued the group’s first-ever statement on statistical practice cautioning about the use of “statistical significance” to evaluate a model’s results.
From the math, a p-value and “statistical significance” is very much a function of the amount of data used by the model. Trivial effect sizes can be “statistically significant” with a large enough sample. What matters is not whether the model crosses some arbitrary level for “statistical significance” but what the effect is telling us: What is the change in the outcome as I change the variable predicting the outcome? This is the answer we need to make better informed decisions.
2) Be wary of large effects and big claims
As a rule of thumb, big claims require big evidence. In most cases, the effects that we find in a model are small and noisy. Noisy in the sense that measurement error and modeling choices will change results. If a study is making a big claim, then there should be a lot of data behind it, other researchers replicated the study with a different sample, and all researchers have shared their data and code.
A similar rule applies to studies claiming “no effect.” In modeling, rarely is an effect size actually zero. Most often, when you see a study claiming “no effect” the effect size is very small, and the data is consistent with a small positive effect, zero effect, or a small negative effect. The problem with “no effect” is that we can misinterpret the result as a definitive “X does not change Y,” when this may not be the case in a different sample, with different measures, and different modeling approaches.
3) Context matters
Have you ever read a study and thought, “when I did the same thing, I didn’t get that result?” Or saw a study suggesting some strategy or business process and you thought “that won’t work in my industry?” You might be right!
Most models produce average estimates—over the data used in the sample, the model estimated an average expected effect of X on Y. That can be useful information, but it is important to remember people, businesses, industries, markets, cities, countries are different. It may be the case that a model showing one effect in one sample may be completely different with another sample in another time and place. We have more confidence in averages when we see a study replicated in a wide variety of settings and samples.
Let’s do an example.
A few years back a group of scientists published a paper purporting the health benefits of cinnamon. Specifically, the authors concluded that “…the inclusion of cinnamon in the diet of people with type 2 diabetes will reduce risk factors associated with diabetes and cardiovascular diseases.”
We can use our three steps to quickly assess the claims made in the study. For example, the study reports a statistically significant change in the average of one group of study participants’ cholesterol level (mmol/l) from 4.91 to 4.09—a 16.7% decrease. While “statistically significant,” a 0.82 decrease has limited clinical significance; that is, it may not make any practical difference in the patient’s cholesterol level.
Recall that the study concludes that cinnamon “will [emphasis added] reduce factors associated with diabetes and cardiovascular diseases.” In the example above, the study based their analysis on 10 patients who took 1 gram—a little over one-third of a teaspoon—of cinnamon per day for 40 days. While possible for a trivial amount of a common spice in use around the world to result in substantive health benefits, it is also not very probable.
Lastly, the researchers recruited a total of sixty Pakistani type 2 diabetes patients for the study, 30 of whom took various cinnamon doses, and 30 took a placebo. To place a lot of confidence in the study’s conclusion, we would need to assume that we can generalize from these thirty individuals to hundreds of millions of people worldwide suffering from type 2 diabetes. Is it possible? Of course. Is it probable? That is far less likely.
The value of the three factors is not to “prove” that a study is wrong. It may be that cinnamon does yield substantial health benefits, although we would need many more studies with many more people from many different countries to have faith in that result.
The value of the three factors is to help you to quickly evaluate how much confidence you should place in what the study is saying. A healthy skepticism is a useful lens!