Naked Statistics: Stripping the Dread from the Data

Charles Wheelan

57 pages 1-hour read

Charles Wheelan

Naked Statistics: Stripping the Dread from the Data

Nonfiction | Reference/Text Book | Adult | Published in 2012

A modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.

Index of Terms

Central Limit Theorem

Wheelan describes the Central Limit Theorem as “the Lebron James of statistics” (127), a foundational concept that powers most forms of statistical inference. The theorem states that for any population, the means calculated from large, properly drawn random samples will be approximately normally distributed around the true full population mean. In other words, regardless of the underlying distribution of the population itself, sample means will form a predictable, bell-shaped distribution, where most sample averages will cluster closely to the population average, with progressively fewer samples having means that are far from it. This allows researchers to quantify the likelihood of observing any particular sample result.


Wheelan illustrates this with the humorously far-fetched hypothetical of searching for a lost bus of marathon runners by trying to determine whether a found bus could be them. If the average marathon runner is thin, it is highly improbable that a random sample of them (an entire busload) would have an extremely high average weight. The Central Limit Theorem allows us to move beyond intuition and calculate the exact probability of such an outcome. 


This principle is also the engine behind polling; it explains why a survey of 1,000 people can have a margin of error of ±3%. The dispersion of these sample means is measured by the standard error (SE = s/√n), which shows that as sample size (n) increases, the sample means cluster more tightly around the true population mean, making inference more precise.

Confidence Interval

A confidence interval is a range of values, constructed from sample data, that is likely to contain the true value of an unknown population parameter, such as a mean or a proportion. The confidence level, typically 95%, refers to the long-term success rate of the method, not the probability that a specific interval is correct. As the book clarifies, “We can say with 95 percent confidence that the range […] contains the average” (158), meaning that if we were to repeat the sampling process 100 times, we would expect 95 of the resulting intervals to capture the true population value. This tool provides a crucial way to express the uncertainty inherent in using a sample to estimate a characteristic of a larger population.


Wheelan demonstrates the concept’s application in multiple contexts. In polling, a finding that a candidate has 53% support, with a ±3% margin of error, is a 95% confidence interval, indicating the true level of support is likely between 50 and 56 percent. Similarly, in scientific research, such as the study on autism where children with the disorder had a higher average brain volume, the fact that the 95% confidence intervals for the brain volumes of the autistic and non-autistic groups did not overlap provided strong statistical evidence that the observed difference was not just a result of random chance.

Correlation Coefficient (r)

The correlation coefficient, or r, is a concise, unitless metric that summarizes the strength and direction of a linear relationship between two variables. It is expressed as “a single number ranging from -1 to 1” (60). A coefficient of 1 indicates a perfect positive correlation, where the two variables move in lockstep in the same direction. A coefficient of -1 indicates a perfect negative correlation, where they move in lockstep in opposite directions. A coefficient of 0 signifies no linear relationship at all. This tool is powerful because it collapses a complex set of data points into an elegant and easily understood descriptive statistic. 


Wheelan provides the height and weight relationship as a classic example of a strong positive correlation. A more modern application is seen in Netflix’s recommendation algorithm, which suggests films by identifying other users whose viewing history is highly correlated with your own. However, Wheelan also uses the concept to deliver one of the book’s most critical warnings: Correlation does not imply causation. He notes that the number of televisions a family owns is likely correlated with their children’s SAT scores, not because watching TV improves test performance, but because a third factor, such as family income, influences both.

P-Value

The p-value, a cornerstone of hypothesis testing, is used to determine “the specific probability of getting a result at least as extreme as the one you’ve observed if the null hypothesis is true” (152). In practice, p-value measures how surprising a finding is. If researchers are testing a new drug, their null hypothesis is that the drug has no effect. A very small p-value indicates that the observed results (e.g., a large improvement in patient health) would be highly unlikely to occur by chance if the drug were truly ineffective. This leads researchers to reject the null hypothesis and conclude that the drug likely has a real effect. The book points to the autism study, which found a difference in brain volume between two groups of children with a p-value of .002. This means there was only a 2-in-1,000 chance of seeing such a large difference if no true difference existed. This low probability provides strong evidence against the null hypothesis. The p-value is compared against a predetermined significance level (often .05), and this threshold determines the trade-off between making a Type I error (falsely concluding an effect exists) and a Type II error (failing to detect an effect that does exist).

Regression Analysis

Regression analysis is a technique used to estimate the relationship between variables. As Wheelan explains, regression analysis “is the tool that enables researchers to isolate a relationship between two variables, such as smoking and cancer, while holding constant (or ‘controlling for’) the effects of other important variables” (11). Its basic form, ordinary least squares (OLS), works by fitting the best possible straight line through a scatterplot of data points, quantifying the association with a regression coefficient. This allows researchers to move beyond simple correlation to model complex phenomena. For example, the Whitehall studies used regression to determine that low job control was linked to heart disease even after accounting for confounding factors like smoking and diet.


The book’s extended example, using the Changing Lives dataset, demonstrates how multivariate regression can explain an outcome like weight by simultaneously considering variables such as height, age, sex, and education. Each coefficient represents the independent contribution of that variable while holding the others constant. 


Wheelan also dedicates significant attention to regression’s potential for misuse. He warns that without careful thought, regression analysis can produce misleading results due to common pitfalls like omitted variable bias (leaving out a key factor), reverse causality (confusing cause and effect), and multicollinearity (including highly correlated explanatory variables).

Standard Error

The standard error measures the expected dispersion of a sample statistic, most commonly the sample mean. While the standard deviation describes the variability within a single sample or population, “The standard error measures the dispersion of the sample means” (136). It answers the question: If we were to draw many random samples from the same population, how much would we expect their means to vary from one another? The standard error is directly linked to the Central Limit Theorem, as it quantifies the spread of the normal distribution formed by these sample means. Its formula, SE = s/√n for a sample mean, shows that the error shrinks as the sample size (n) grows, making inference more precise. This concept is fundamental to building confidence intervals and conducting hypothesis tests. It is used to determine the margin of error in polls and to calculate how many standard errors a particular result is from a hypothesized value, such as in the hijacked bus example where a large difference in weight was deemed statistically significant because it was many standard errors away from the population mean.

blurred text
blurred text
blurred text

Unlock all 57 pages of this Study Guide

Get in-depth, chapter-by-chapter summaries and analysis from our literary experts.

  • Grasp challenging concepts with clear, comprehensive explanations
  • Revisit key plot points and ideas without rereading the book
  • Share impressive insights in classes and book clubs