Bias vs. Lurking Variables — What’s the Difference?

Lucia Bevilacqua
6 min readMar 16, 2021

“Bias” and “lurking variables” are two of the most important factors in judging how well a study is designed. And from my experience as an introductory statistics TA, they’re the two concepts that get most often mixed up!

With Bias, You Can’t Claim a Result Applies to the Broader Population

Sometimes statistics “miss the mark” — they’re off from the true figure. (Image by Lucian Alexandru Motoc, via Dreamstime)

When you try to get a figure for a population (say, an average, or a percentage, or the strength of a correlation), but you can’t gather information from every member, you can select a representative sample and get the statistic from that. There’s bound to be some variation, of course. It’s not going to be exactly the same figure as for the whole population, not the exact same statistic every time. But as long as it’s a truly randomly selected sample, and the sample size is large enough, it shouldn’t be too far off.

When a sample is not representative of the population you’re trying to study, it can be biased toward certain results. If so, the statistic will be “off” in a particular direction, higher or lower than the real population figure, more than you could expect by random chance. It’s very likely not like what you’d find if you measured the whole population, so it shouldn’t be generalized to that population.

With Lurking Variables, You Can’t Claim Cause-and-Effect from a Correlation

Maybe X doesn’t cause Y; it just seems associated with Y just because Z causes both X and Y. (Image source: طاها and JC713 via Wikipedia)

When X and Y are correlated, knowing X can help you predict Y. For example, knowing someone’s income level and religiosity can help you predict their likely level of happiness. Much of the variation in the population’s happiness levels can be predicted by differences in income and religion.

You probably know many more correlations. Knowing someone’s SAT score can help you predict what kind of college is willing to accept them. Knowing people’s drinking, smoking, diet, or exercise habits can help you predict their relative risks of cancer. Knowing whether someone is male or female can help you predict a range of typical hormone levels, healthy iron levels, skeletal proportions, even heart attack symptoms.

In these cases, it’s clear which causes which. But correlation doesn’t always imply cause-and-effect. There might be lurking variables that turn out to be the real reason for the relationship.

Say we find a strong correlation between more hours spent practicing a musical instrument and higher standardized test scores. The correlation is real; a high-practice student musician can be reliably expected to have a higher score than a non-musician. Does that mean schools should require music courses so they can raise all students’ test scores? No!

This effect might be because the kind of student who practices an instrument more is the same kind of student who’d score higher on test scores, and not necessarily because they’re playing the instrument more. These students may have greater motivation to succeed, higher IQs, higher socioeconomic statuses, or families that are stricter about working hard. If those are the real reasons for high test scores, and they just happen to be associated with students who practice instruments a lot, then forcing instrument practice on a different group of students probably won’t raise their scores.

However, say we got a large sample of students and randomly assigned some to practice instruments for 10 hours a week, some to practice instruments for 5 hours a week, and some not to practice instruments at all. We gave them similar standardized tests before and after this three-month study, and we found that the students in the 10-hours-a-week group showed a moderate score increase on average, the 5-hours-a-week group showed a small but significant average increase, and the control group showed no significant increase. Now can we claim that more instrument practice leads to higher test scores? Yes!

When subjects are randomly assigned, it’s very unlikely that one group will greatly differ from another in the factors that matter for test score improvement. The groups should look fairly similar, with individual differences evenly distributed. So why would one group improve its test scores much more than the other groups? The key difference between them is the level of instrument practice — so if they had significantly different levels of improvement, beyond what you could expect due to random chance, it’s safe to say that was indeed the cause.

How This Looks in Practice

Quick quiz! Each study is an example of bias, lurking variables, both, or neither. Which is which?

  • You poll members of your Southern Baptist congregation about a proposed state law expanding abortion restrictions. Over 95% support it — what an overwhelmingly popular idea! (A larger poll of the general state population gets a statistic of 53%.)
  • You survey a large sample of elderly men and find that owning a dog is associated with significantly higher happiness levels. A journalist for Cosmopolitan writes an article based on this finding, called “Want to Be Happier? Get a Dog.”
  • You find that consistently, throughout the whole U.S., high schoolers who don’t work paid jobs spend more money each month than high schoolers who do have jobs. It seems counterintuitive — how could you have less to spend if you’re getting paychecks?
  • You recruit a computer-selected sample of kindergarten-aged children from many different school rosters, randomly split them into groups, and give each group a different educational toy to play with. After thirty minutes, they are asked to rate this play session on a scale of options from “Super Boring!” to “Super Fun!”. Two toys tend to be judged as more fun, but one toy is significantly more likely to have “Super Boring!” ratings. You conclude that kindergarten-aged children are less likely to enjoy this toy than the other toys.

The answers:

  • Bias. The statistic you got is way off from the likely percent of the state that approves because the sampling isn’t random. Southern Baptist churchgoers are very likely to support abortion restrictions; the broad variety of sociopolitical beliefs you’d see throughout the state population isn’t fully represented here.
  • Both. It’s possible that dog ownership increases happiness, but it also could be because elderly men who own dogs tend to have a more comfortable living situation and amount of money to live on, or aren’t too ill or stressed to take care of a dog. Even if getting a dog does increase happiness in this specific demographic, there’s no guarantee it’ll do as much for the young women reading Cosmopolitan.
  • Lurking variables. Many high schoolers who work need to do so to support their households. The high schoolers who have plenty of money to spend on new shoes and new electronics and outings with friends, on the other hand, get to spend their teenage years not working because they come from well-off families. Here, difference in family income largely explains the difference in spending.
  • Neither. It was randomly assigned, so it’s justified to say that the difference in toys made the difference in scores, and it was a large random sample, so it’s justified to generalize it to the rest of the kindergarteners.

Bias can be eliminated with random sampling, and lurking variables can be eliminated with random assignment. These might not always be fully feasible, but with careful study design, researchers can reduce their chance of serious errors. The key is to know what to watch for!

--

--