Naked Statistics

Charles Wheelan

Guide cover image

Naked Statistics: Stripping the Dread from the Data

Charles Wheelan

57 pages • 1-hour read

Charles Wheelan

Naked Statistics: Stripping the Dread from the Data

Nonfiction | Reference/Text Book | Adult | Published in 2012

A modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.

Summary

Background

Chapter Summaries & Analyses

Introduction-Chapter 3

Chapters 4-6

Chapters 7-10

Chapter 11-Conclusion

Key Figures

Themes

Index of Terms

Important Quotes

Essay Topics

Book Club Questions

Tools

Discussion Questions

Summary and Study Guide

Overview

Charles Wheelan’s New York Times bestseller, Naked Statistics: Stripping the Dread from the Data (2013), is a work of popular science that aims to make the core concepts of statistics intuitive and accessible to a general audience. A senior lecturer in public policy at Dartmouth College and former correspondent for The Economist, Wheelan is known for demystifying complex subjects, as seen in his bestselling Naked Economics (2002).

In Naked Statistics, Wheelan argues that statistics is a powerful and essential tool for understanding the modern world. Using examples from sports, finance, medicine, and everyday life, the book explains fundamental ideas from descriptive statistics and probability to more advanced methods like regression analysis, all while avoiding dense mathematics and jargon.

Conceived in homage to Darrell Huff’s 1954 classic, How to Lie with Statistics, Wheelan’s book serves as both an introduction to the discipline and a guide to critical thinking in an age of abundant data. He provides readers with a toolkit for evaluating claims and making informed judgments. The book’s themes include the idea that Statistical Literacy Is Empowering, the practical application of Probability as a Tool for Better Decisions, and how Statistics Can Mislead or Be Manipulated, which leads to catastrophic errors in fields from finance to medicine. Naked Statistics argues that statistical literacy is about developing a sound, logical framework for interpreting information and separating credible evidence from misleading claims.

This guide is based on the 2014 paperback edition published by W. W. Norton & Company.

Plot Summary

Author Charles Wheelan argues that statistics is an essential tool for understanding the modern world. He introduces the book’s purpose: to explain the core concepts of statistics through accessible examples, stripping away intimidating jargon and mathematics.

The first chapter demonstrates that people use statistics in everyday life without realizing it. Statistics typically condense complex information into a single, useful number. The chapter outlines the primary functions of statistics: description and comparison, inference from a sample to a population, assessing risk through probability, and identifying important relationships in data—a form of “statistical detective work” (9).

Chapter 2 focuses on descriptive statistics, using the US middle class’s economic health and comparing baseball players as framing devices. It explains the difference between the mean (average) and the median (midpoint). Wheelan shows how the mean is easily distorted by outliers, making the median a better measure for skewed data like income. The chapter concludes with expert advice to use median wage to track the middle class, revealing decades of stagnation, and advanced metrics to evaluate baseball players.

Chapter 3 details how descriptive statistics can mislead. Wheelan distinguishes between precision and accuracy: Precise numbers can mask fundamental data inaccuracies. Conclusions can also be manipulated by changing the unit of analysis: For example, in US manufacturing, output is rising while employment is falling. Other deceptive techniques include cherry-picking the mean or median, making comparisons without adjusting for inflation, and using performance metrics that create perverse incentives.

Chapter 4 examines correlation, which measures the relationship between two variables. The primary example is Netflix, which recommends movies by identifying other users whose viewing history is highly correlated with a customer’s own. The chapter stresses a crucial warning: Correlation does not imply causation. A likely positive correlation between a family’s television ownership and a child’s SAT scores, for instance, is not causal; a third variable, such as parental income, is likely responsible for both.

In Chapter 5, the principles of basic probability are explained. The chapter covers fundamental rules, such as calculating the probability of two independent events both occurring. It introduces the concept of expected value, which weighs each possible outcome by its probability, to demonstrate why, for instance, playing the lottery is a poor financial decision. A short follow-up chapter is dedicated to the Monty Hall problem, explaining the counterintuitive correct strategy: A contestant should always switch doors, as this increases the probability of winning from 1/3 to 2/3.

Chapter 6 addresses problems with probability. The central example is Wall Street’s false sense of security before the 2008 financial crisis, caused by underestimating “tail risk,” or the small probability of a catastrophic event. Other common mistakes include assuming events are independent when they are not, the “gambler’s fallacy” (believing an outcome is “due”), misunderstanding that statistical clusters can occur by chance, the “prosecutor’s fallacy” (misinterpreting long-odds probabilities), and reversion to the mean.

The importance of good data is emphasized in Chapter 7 with the maxim “garbage in, garbage out.” Data that undermine statistical analysis include selection bias, where the sample is not representative of the population; publication bias, where positive results are more likely to be published than negative ones; recall bias, where memory is skewed by current circumstances; survivorship bias, where failed subjects are excluded from results, making performance look better than it is; and healthy user bias, which makes it difficult to isolate the effect of a single healthy behavior.

The central limit theorem is introduced in Chapter 8 as the foundation of statistical inference. The theorem states that the means of large, random samples drawn from a population will themselves form a normal distribution around the true population mean. This allows researchers to use the properties of the normal distribution to make inferences. The standard error, which measures the dispersion of these sample means, provides the mathematical basis for drawing conclusions from samples.

Building on this, Chapter 9 explains the process of statistical inference. A researcher starts with a null hypothesis (or a default assumption) and an alternative hypothesis. If the probability that the null hypothesis is true is below a predetermined threshold (often 5%), the null hypothesis is rejected. The chapter also discusses the unavoidable trade-off between Type I errors (false positives) and Type II errors (false negatives).

In Chapter 10, principles of inference are applied to public opinion polling. The greatest challenges in polling are methodological, not mathematical: obtaining a truly representative sample, wording questions in a neutral way, and ensuring respondents give truthful answers to sensitive questions.

Chapter 11 introduces regression analysis, a powerful tool for quantifying the relationship between a dependent variable (the outcome being explained) and one or more explanatory variables (the factors used to explain it). Wheelan shows how regression can isolate a single relationship within a complex web of data. An example builds a multiple regression model of body weight, using variables like height, age, sex, and education.

Chapter 12 serves as a “mandatory warning label” for regression analysis, detailing common mistakes, including confusing correlation with causation, reverse causality, failing to include a crucial variable, extrapolating beyond the data, and data mining.

Chapter 13 focuses on program evaluation, or the methods used to determine causality. To measure the effect of an intervention, one must compare it to what would have happened without it through randomized experiments, natural experiments in the real-world, or others.

The Conclusion poses five major questions that statistics can help answer: the future of football and brain injuries, the causes of autism, the evaluation of teachers, the fight against global poverty, and the ethical challenges of data privacy in the digital age.

blurred text

blurred text

blurred text

Unlock all 57 pages of this Study Guide

Get in-depth, chapter-by-chapter summaries and analysis from our literary experts.

Grasp challenging concepts with clear, comprehensive explanations
Revisit key plot points and ideas without rereading the book
Share impressive insights in classes and book clubs

Unlock Full Study Guide