57 pages • 1-hour read
Charles WheelanA modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.
The chapter opens with findings from the longitudinal Whitehall studies of British civil servants in the 1960s and ’70s, which revealed that workers with minimal decision-making authority face significantly elevated mortality risk from coronary heart disease compared to their superiors. Wheelan asks how researchers arrived at their conclusion, given that simple comparisons between job ranks would be confounded by factors like education, smoking habits, and childhood health.
The answer is regression analysis, a statistical tool that quantifies the relationship between variables while controlling for other factors. The methodology resembles polling: It estimates relationships from sample data and tests hypotheses, but it cannot prove causation. Poor analysis can lead to false conclusions or mistaking reverse causality.
The core principle involves finding the best-fit line for a linear relationship between variables using ordinary least squares, which minimizes the sum of squared residuals. In other words, once all the variables have been plotted on a graph as points, regression analysis creates a line through them that minimizes distance from any one point. The resulting regression equation takes the form y = a + bx, which Wheeler hopes readers remember from algebra as the equation for a line.
Using data from the Changing Lives study of 3,537 American adults, Wheeler plots height and weight on a graph in an effort to see how the two correlate. A simple regression produces the equation WEIGHT = −135 + 4.5(HEIGHT). The coefficient 4.5 indicates that each additional inch of height is associated with 4.5 additional pounds.
Multiple regression expands the model by adding other variables. In the case of the Changing Lives study, these are age, sex (as a dummy variable of value 1 for female and 0 for male), education, exercise levels, poverty status, and race. Each coefficient isolates one variable’s effect while holding others constant. This way, researchers can see the ways each aspect of the study affects participants’ weight. Education and exercise correlate negatively with weight; poverty and being non-Hispanic Black correlate positively. In another example, a study of MBA graduates illustrates regression’s power by attributing much of the observed gender wage gap to factors such as differences in pre-MBA training, career interruptions, and hours worked, rather than discrimination.
The chapter concludes by explaining the statistical intuition: Regression effectively sorts observations into groups identical in all respects except for one variable, then calculates the coefficient that best explains remaining variation across all groups. The Whitehall studies exemplify this approach by confirming that low job control increases heart disease risk even after accounting for traditional health factors.
The chapter begins with a warning illustrated by hormone replacement therapy (HRT). Based on observational studies using regression analysis, including the Nurses’ Health Study at Harvard, millions of postmenopausal women were prescribed estrogen supplements in the 1960s to prevent heart disease. These studies found that women taking estrogen had one-third as many heart attacks as those not taking it. However, subsequent randomized clinical trials revealed that estrogen actually increased risks of heart disease, stroke, blood clots, and breast cancer. The New York Times Magazine estimated tens of thousands may have died prematurely due to this misinterpretation of observational data. However, illustrating Wheelan’s point about the complexity of statistics and the need for astute analysis rather than a reliance on precise math is the fact that after Naked Statistics was published, new research and a reexamination of the clinical trials results have concluded that HRT is actually incredibly beneficial for women, provided they begin treatment during perimenopause or immediately after; the original worrying results were skewed by older women in the study pool
The author presents seven common regression pitfalls. First, applying regression to nonlinear relationships produces meaningless results, like trying to use a single line to describe a curved pattern. Second, correlation does not establish causation; associations can be spurious or reflect reverse causality, where the supposed effect actually causes the supposed cause. Third, omitting important variables creates bias when other variables incorrectly absorb the omitted variable’s effect. For example, analyzing golf’s health effects without controlling for age falsely suggests golf causes disease, when golfers simply tend to be older.
Fourth, including highly correlated explanatory variables (multicollinearity) prevents the analysis from distinguishing their separate effects. Fifth, extrapolating beyond the sample population produces invalid predictions, like using adult data to predict a newborn’s weight. Sixth, including too many variables risks finding statistically significant results by chance alone—data mining that yields false discoveries.
Even peer-reviewed medical research frequently fails to replicate. Researcher John Ioannidis estimates that roughly half of published scientific papers are eventually proven wrong. This emphasizes two lessons: Properly specifying which variables to include matters more than calculations, and regression builds only a circumstantial case requiring theoretical justification and replication.
Program evaluation seeks to measure an intervention’s causal effect by comparing its outcomes with the counterfactual, or what would have happened without the intervention. The challenge of properly evaluating programs is illustrated by studying whether more police reduce crime. Simply comparing jurisdictions with different police levels fails because cities differ fundamentally. Even regression analysis encounters reverse causality problems, as high-crime areas hire more officers.
Jonathan Klick and Alexander Tabarrok addressed this by exploiting the Washington, DC terrorism alert system as a natural experiment. On high-alert days, DC deploys additional police for terrorism concerns unrelated to street crime. The researchers found that ordinary crime dropped roughly 7% on high-alert days, demonstrating that police presence deters crime.
The chapter outlines five program evaluation approaches. Randomized, controlled experiments represent the gold standard; these randomly assign subjects to treatment and control groups. Examples include a study finding that prayers by strangers do not improve heart surgery outcomes, and Tennessee’s Project STAR, which demonstrated that smaller class sizes enhance student learning.
However, it is often logistically or ethically impossible to create randomized controlled experiments. For this reason, the second approach is natural experiments, which allow researchers to exploit (in this case a technical term that just means to use data from) circumstances that happened to create treatment and control groups accidentally. For example, Adriana Lleras-Muney noticed that states passed laws requiring minimum years of schooling at different rates. She examined these historical variations to show that one additional year of education extended life expectancy by 18 months.
Third are nonequivalent controls, which use nonrandomized but similar comparison groups. Stacy Dale and Alan Krueger compared students who attended elite colleges with equally talented students who had been accepted to such schools but ended up attending less selective institutions instead. They found no significant earnings differences after college for all except low-income students.
Fourth, difference-in-differences compares how outcomes change over time between treatment and control groups, isolating the intervention’s effect from broader trends. Finally, Discontinuity analysis compares groups just above and below arbitrary eligibility cutoffs. Randi Hjalmarsson used this approach with Washington State’s juvenile sentencing guidelines, finding that incarceration deterred future crime.
The chapter concludes that careful methodology enables researchers to approximate counterfactuals and identify causal relationships even when controlled experiments are impossible.
The conclusion notes that society is now overwhelmed with data, contrasting sharply with earlier eras when policymakers worked blind. During the Great Depression, Herbert Hoover declared recovery in 1930 based on faulty data showing 2.5 million unemployed, when actually 5 million were jobless and unemployment was climbing by 100,000 weekly.
Five socially significant questions demonstrate statistics’ power. The first concerns the future of football. Mounting evidence links the sport to chronic traumatic encephalopathy and other long-term brain damage. Researchers are using statistical tools to determine whether football can be played safely or whether, as author Malcolm Gladwell suggested, it resembles dog fighting: An activity that was seen as acceptable in the past but is now viewed as willingly subjecting participants to suffering.
The second question addresses autism spectrum disorder diagnoses, which have nearly doubled in a decade, with the condition affecting 1 in 88 children in 2012, when Wheelan was writing. (In 2023, 1 in 36 children received this diagnosis). Statistical analysis seeks to determine whether this represents a true “epidemic” or merely increased awareness, while attempting to identify causal factors. Statistics has already debunked false associations with vaccines, which definitively do not cause autism.
Third is assessing our educational system. Value-added models use student test score gains to evaluate teachers, but these assessments contain substantial statistical noise. Economist Doug Staiger warns that the correlation in year-to-year teacher performance is only 0.35, similar to baseball players’ performance correlation. Air Force Academy data reveal that professors producing impressive immediate test scores may actually hinder long-term learning compared to experienced instructors who emphasize fundamental concepts.
Fourth has to do with global poverty. Esther Duflo has helped reshape development economics by conducting randomized experiments in low-income countries. Her work has demonstrated that interventions like subsidized fertilizer delivery and photographic attendance monitoring can significantly improve outcomes. A natural experiment in Côte d’Ivoire also revealed that when women earned extra income, they spent it on family food, whereas men were less likely to do so.
Finally, there is the question of data mining and privacy. The marriage of data and technology creates unprecedented privacy challenges. Target’s predictive analytics identified pregnant shoppers from their purchasing patterns, sometimes before family members knew. The US Supreme Court ruled that law enforcement cannot track vehicles without warrants.
The book concludes that statistics is like fire or knives—an immensely useful tool that is dangerous when misused. Data must be employed wisely, with human judgment always accompanying mathematical analysis.
The final section of the book presents regression analysis as a sophisticated statistical tool while also illustrating its potential for misuse. Chapter 11 introduces regression as a “miracle elixir” (185), a method capable of isolating a single relationship within a complex web of confounding variables. The Whitehall studies, which linked low job control to heart disease even after accounting for factors like smoking and education, exemplify this function. This analysis moves beyond simple correlation to a more complex understanding of how variables interact. However, Chapter 12 counters this perspective with a “mandatory warning label” (212). The consequential misinterpretation of observational data regarding hormone replacement therapy serves as a structural and thematic counterweight, demonstrating that regression’s ability to find precise patterns does not guarantee their validity. This juxtaposition reveals a core argument: The tool’s sophistication magnifies the consequences of its misuse. The capacity to uncover hidden truths is linked to the potential to create dangerous falsehoods, shifting the burden of validity from the statistical test itself to the intellectual rigor of the researcher. Underscoring the potential pitfalls is the fact that in the late 2010s, after Naked Statistics was published, the dire warnings about HRT were debunked as the product of incorrect analysis that grouped women decades into menopause with much younger subjects; HRT is highly beneficial when started during or just after perimenopause.
Wheelan prioritizes the logical framework of research design over the mechanics of computation, since Statistics Can Mislead or Be Manipulated. While software has made running regressions effortless, he argues that “the hard part is determining which variables ought to be considered in the analysis and how that can best be done” (187). The seven common regression mistakes detailed in Chapter 12 are not mathematical errors but failures of logic and theory. Omitted variable bias, for example, is a conceptual flaw, as seen in the erroneous conclusion that golf causes heart disease when the true causal agent, age, is ignored. This emphasis on proper specification frames statistics as an art of critical thinking. The author’s warnings against spurious causation, reverse causality, and data mining reinforce the idea that a regression equation is only as sound as the hypothesis it is built to test. By focusing on the intellectual process of building a model, the text suggests that statistical literacy requires the ability to deconstruct the assumptions that underlie any quantitative claim.
Building upon the problem of distinguishing correlation from causation, Chapter 13 presents program evaluation as a toolkit for establishing causality. Showcasing Probability as a Tool for Better Decisions, the text pivots to methods designed to approximate the counterfactual, or what would have happened in the absence of an intervention. The various techniques presented—randomized controlled experiments, natural experiments, nonequivalent controls, difference-in-differences, and discontinuity analysis—are framed as approaches to overcome ethical or practical barriers to direct experimentation. The Dale and Krueger study on elite colleges serves as a key example. By using a nonequivalent control group, the researchers distinguish the “treatment effect” of an elite education from the “selection effect” of admitting talented students, leading to the conclusion that individual drive often matters more than institutional prestige after graduation. This structural progression from description to correlation to causation mirrors the development of scientific inquiry.
The conclusion situates statistical reasoning within a broader social and ethical context, allowing readers to end the book with the sense that Statistical Literacy is Empowering in a world rife with complex questions. The five issues Wheelan raises—football’s safety, the causes of autism, teacher evaluation, global poverty, and data privacy—apply the book’s concepts to matters of public concern. These examples demonstrate that statistical analysis is a primary tool for debating contemporary challenges. The discussion of Target’s predictive analytics, which can identify a pregnancy from purchasing patterns, brings the ethical dimension into focus, illustrating the tension between utility and intrusion in a data-saturated world. By ending with the analogy of statistics as a tool like fire or a knife, the text delivers a final, overarching message: The value and danger of data lie not in the numbers themselves, but in the human judgment that wields them. The ultimate purpose of statistical literacy, in this view, is not merely to analyze the world, but to navigate it wisely.



Unlock all 57 pages of this Study Guide
Get in-depth, chapter-by-chapter summaries and analysis from our literary experts.