These days, anyone can prop up an argument by pulling up a statistic. Much has been written about how statistics can be faulty — perhaps due to publication bias, or lack of generalizability, or researcher data-dredging.
But even genuine, trustworthy statistics, when plucked out of context, can mislead. Here are three common ways.
1. Using measures of the wrong thing
Say we want to know how rates of marijuana usage have changed over time in the U.S. The evidence: a graph of the yearly marijuana possession arrest rates.
The problem with this is quite clear: most marijuana users don’t get arrested! Maybe the change in marijuana arrest rates echoes the change in marijuana usage rates, but maybe (say, due to changes in policing), it doesn’t.
The better approach is to use a more direct measure. In this case, we could consult surveys of nationally representative samples of the population, seeing how the rates of self-reported marijuana usage have changed over the years. A layperson objection is that the arrest rates graph is a more “reliable source.” But if it’s not as relevant to the question, it forms a less reliable answer. As long as their methods make them trustworthy and generalizable enough, the surveys are appropriate here.
It’s not always so easy to check that a study measures the thing we want to see measured. Sometimes our definition of the outcome is not the source’s definition, especially if it’s a fuzzy term.
This is often seen in discussion of the “success” of an approach — as George Carlin once said, “if at first you don’t succeed, redefine success.” If proponents of a teaching approach can’t find measurable evidence that it improves students’ learning, they may use surveys of subjective student engagement and motivation. “Those results are what really matter,” they’ll basically say, rather than admit they failed to find what they expected.
Doubly fuzzy is the question of how “the U.S. economy is affected by whether Republicans or Democrats are in office.” Depending on how you measure which party is “in office” and how you measure the well-being of the “economy,” as you can test on this FiveThirtyEight widget, you could slice the data dozens of ways to support whatever conclusion you want!
2. Highlighting a difference that doesn’t make a difference
It’s true that the Trump administration saw record low unemployment rates. It’s not necessarily true that this is evidence of the Trump administration lowering unemployment rates.
As the graph above displays, the record lows weren’t the result of any new downward trend that began once President Trump took office. The ongoing downward trend didn’t even appear to change. In other words, they’re what one would expect to see if the Trump administration had no effect on unemployment rates. A substantial effect would lead to a decrease at a steeper rate than before, more like this:
Similarly, be wary of pricey SAT prep programs that showcase an increase from, say, 1460 to 1500 as one of their successes. The standard error for the SAT is 30 points, meaning that for a given test session, a test-taker can be 95% sure that the score for that test is within 60 points of one’s “true score.” So an increase from 1460 to 1500 is well within the range of random variation if the test-taker didn’t improve at all. This student could’ve easily gotten 1500 one time and 1460 the next. The order of these randomly varying scores just happened to be in the prep program’s favor.
In short, it’s not enough to simply point out a difference. Things don’t usually stay perfectly the same. You need to show how this difference is beyond what would normally be expected.
3. Making the wrong comparisons
Is there a particular association between being a Black American and being Jehovah’s Witness? Of Jehovah’s Witnesses in the U.S., 36% are non-Hispanic White, and 27% are Black.
If you say “No, because there are more Whites,” you’re making the wrong comparisons. There are more, true, but that is true of the U.S. as a whole, where 60.1% of the population is non-Hispanic White, and 13.4%, Black.
It’s not very informative to say that an American who’s Jehovah’s Witness is more likely to be White than Black — what we want to know is whether a Black American is more likely to be Jehovah’s Witness than a White American, which turns out to be true.
In the relationship between two variables, usually one variable is the “explanatory variable,” and one is the “response variable.” In this case, race is explanatory, as something one is born into, and religion is the response, as something one chooses to identify with. (It’s not as if being born Jehovah’s Witness makes one more likely to become Black someday.) To check for a relationship, don’t compare the rates of different explanatory groups within the same response group, because these groups aren’t necessarily the same size in the population. Instead, compare the rates of the same response variable in the different explanatory groups.
To illustrate: Is shortness of breath associated with COVID-19? Only 18.6% of confirmed positive cases report it, meaning 81.4%, a large majority, don’t.
But 18.6% versus 84.1% isn’t the right comparison. COVID-19 status is the explanatory variable and development of this symptom is a response, so 18.6% should be compared to whatever percent of COVID-19-negative people experience shortness of breath, a percentage that is surely much lower. So yes, shortness of breath is a good indicator that one could be COVID-19-positive.
It disappoints me to hear peers say, All that statistics stuff is not my strong suit, especially when they’re studying fields that so heavily rely on it. It’s not all about generic math formulas! “Statistical” thinking helps you use the right data from the real world to answer the right questions. Who can’t benefit from that?