Naked Statistics

Charles Wheelan

Guide cover image

Naked Statistics: Stripping the Dread from the Data

Charles Wheelan

57 pages • 1-hour read

Charles Wheelan

Naked Statistics: Stripping the Dread from the Data

Nonfiction | Reference/Text Book | Adult | Published in 2012

A modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.

Summary

Background

Chapter Summaries & Analyses

Introduction-Chapter 3

Chapters 4-6

Chapters 7-10

Chapter 11-Conclusion

Key Figures

Themes

Index of Terms

Important Quotes

Essay Topics

Book Club Questions

Tools

Discussion Questions

Chapters 4-6Chapter Summaries & Analyses

Chapter 4 Summary: “Correlation: How Does Netflix Know What Movies I Like?”

Netflix accurately recommends films by exploiting correlation in taste between users. The system compares a user’s movie ratings with those of other customers, identifies other users with highly similar preferences, and suggests films those other viewers enjoyed. The author’s recommendation of the documentary Bhutto stemmed from his five-star ratings for two other documentaries, Enron: The Smartest Guys in the Room and Fog of War.

Correlation measures the relationship between two phenomena. When both increase together (like summer temperatures and ice cream sales, or height and weight), they are positively correlated. When one increases as the other decreases (like exercise and weight), they are negatively correlated. Individual observations may not fit the pattern, but meaningful relationships can exist between sets of observations on average.

The correlation coefficient encapsulates these associations in a single number ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no meaningful relationship. Because the coefficient is just a number, it can compare variables measured in different units. In other words, calculating the coefficient converts all data to standard deviations from the mean, eliminating units and enabling the formula to assess relationships across all observations.

The SAT test illustrates how correlation is applied in education. The test aims to provide a standardized measure of academic ability, since high school grades vary across schools and curricula. Data from the College Board, the company that owns and administers the test, shows that a student’s high school GPA correlates with their first-year college GPA at 0.56, and that an SAT composite score correlates to first-year achievement at the same level. However, combining both metrics yields the strongest predictor of college performance, with a correlation of 0.64.

Critically, correlation does not imply causation. Wheelan imagines that students’ SAT scores and household television ownership likely correlate positively. However, he points out that buying more TVs will obviously not boost scores. Instead, a third variable—parental education levels and income—probably drives both. College Board data confirms that students from families earning over $200,000 have mean SAT math scores of 586, compared to 460 for students from families earning $20,000 or less.

In 2006, Netflix launched a contest offering $1 million to anyone who could improve its recommendation system by at least 10%. Competitors received training data of over 100 million ratings and were judged on their ability to predict 2.8 million withheld ratings. In 2009, a seven-person international team of statisticians and computer scientists won. Their explanatory paper runs 92 pages, but the core principle remains simple: Find someone with similar tastes and ask what they liked.

Chapter 5 Summary: “Basic Probability: Don’t Buy the Extended Warranty on Your $99 Printer”

In 1981, Schlitz Brewing spent $1.7 million on a seemingly risky marketing campaign, conducting blind taste tests between Schlitz and competing beers on live television during NFL playoff games, culminating at Super Bowl halftime. The company’s statisticians understood that most beers in this category taste similar, making blind tests essentially coin flips. By testing only consumers who preferred competing brands, Schlitz ensured that roughly half would choose Schlitz—making their beer look impressive. At the Super Bowl, Schlitz tested 100 Michelob drinkers, and exactly 50 picked Schlitz.

This strategy exemplifies a binomial experiment: Fixed trials with two possible outcomes and a constant probability of success. Statistical calculations based on the normal distribution showed a 98% chance that at least 40 of the 100 testers would choose Schlitz, making the gambit far less risky than it appeared.

Probability quantifies uncertainty. Some events have known probabilities: flipping heads with a fair coin is 1/2; rolling a one with a die is 1/6. Others have probabilities inferred from data, such as the 0.94 success rate for NFL extra points. Understanding these figures clarifies decision-making and makes risks explicit. For instance, Australian Transport Safety Board data shows motorcycle fatalities are 35 times higher than car fatalities per distance traveled. In another example, when a NASA satellite fell in 2011, the probability of any specific individual being hit was 1 in 21 trillion, though the probability that someone somewhere would be hit was 1 in 3,200.

Human fears often misalign with statistical risks. Research cited in the book Freakonomics (2005) by Steven Levitt and Stephen Dubner shows children are a hundred times more likely to die in a backyard pool than from a gun accident. Cornell researchers Garrick Blalock, Vrinda Kadiyali, and Daniel Simon found that after the terrorist attacks of 9/11 increased fear of flying, resulting increased driving caused an estimated 344 additional traffic deaths per month in the final quarter of 2001.

DNA analysis demonstrates probability’s power in forensics. A DNA match must be accompanied by proof that the match is not coincidental. Testing 13 loci on the two strands of DNA being compared typically produces probabilities of one in a billion or better. When the remains of World Trade Center victims from the 9/11 terrorist attack were being identified, this standard was required. However, when resources are limited and fewer loci are tested, the increased odds of coincidental matching run into the public’s expectation that DNA testing is foolproof, with potentially dramatic results. A 2008 Los Angeles Times investigation questioned law enforcement probability estimates used in criminal trials because an Arizona lab found two unrelated felons matching at nine loci—an event the FBI had calculated as 1 in 113 billion.

The probability of multiple independent events both occurring equals the product of their individual probabilities. So, flipping heads twice in a row is ½ × ½ = ¼. This principle explains password security: A six-digit numerical password offers only 1 million combinations, but adding letters increases possibilities to over 2 billion, and eight characters with symbols yields over 20 trillion combinations. Crucially, this formula applies only to independent events. Crashing your car this year and next year are not independent; whatever caused one crash may cause another, which is why insurance premiums rise after accidents.

For calculating the probability that either Event A or Event B occurs, mutually exclusive events require summing their probabilities. Rolling a 1, 2, or 3 with a die is 1/6 + 1/6 + 1/6 = ½ because only one of those numbers can be rolled in any one roll. For non-mutually exclusive events, like drawing a five or a heart from a deck, sum the individual probabilities and subtract the probability of both (in this case, the odds that you’ll draw the five of hearts): 4/52 + 13/52 - 1/52 = 16/52.

Expected value extends probability by weighting each outcome by its probability and summing all possibilities. If rolling a die pays $1 for a one, $2 for a two, and so on, the expected value of rolling the die at all is $3.50 (the sum of all the possible rolls divided by the number of possibilities). This tool guides decision-making. In the NFL, kicking the extra point has an expected value of 0.94 points (1 point × 0.94 success rate), while a two-point conversion yields 0.74 points (2 points × 0.37 success rate), making the kick the better long-term strategy.

Expected value also reveals why lottery tickets are poor investments. The author calculates that a $1 Illinois instant ticket has an expected payout of roughly 56 cents. Despite winning $2 on a lottery ticket he buys on a lark, Wheelan knows that the underlying probability made this purchase a bad decision. The law of large numbers states that as independent trials increase, the average outcome converges on the expected value. One might win today, but buying thousands of tickets will almost certainly result in losing money. This principle ensures casino profits over the long term and explains why Schlitz used 100 taste testers rather than 10—more trials reduce the chances of random, unlucky outcomes.

Insurance companies exploit expected value by ensuring premiums exceed expected losses. If a $40,000 car has a 1-in-1,000 chance of being stolen annually, the expected loss is $40. Premiums must exceed this amount for the company to profit. For individuals, buying insurance makes sense for potentially catastrophic losses (a stolen car, a burned house) but not for small, affordable risks. This means that someone as wealthy as Warren Buffett could rationally self-insure because he could absorb most losses. Likewise, for regular people, buying an extended warranty on a $99 printer is statistically unwise: The warranty costs more than the expected repair expense, and the potential loss will not meaningfully affect one’s life.

Decision trees organize complex scenarios that have multiple contingencies. For example, imagine an investment opportunity in a baldness cure. The company has a 30% chance of discovering an effective treatment, followed by a 60% chance of getting FDA approval, and finally a 90% chance of avoiding competition. If the company is successful, a $1 million investment yields $25 million return. By mapping all outcomes and their probabilities, the expected value of the investment calculates to $4.225 million. However, the most likely outcome is failure, with only $250,000 out of the million recouped, so the decision depends on one’s appetite for risk.

Widespread screening for rare diseases can produce counterintuitive results. Wheelan presents a scenario of a disease affecting 1 in 100,000. If we have a test that has only a 0.01% false positive rate, but decide to screen 175 million Americans, we would identify 1,750 actual cases, but also generate 17,500 false positives. In other words, only 9% of those told they have the disease would actually be sick, wasting resources and creating enormous anxiety.

Probability helps identify suspicious patterns. The SEC uses computers to scrutinize hundreds of millions of stock trades, looking for activity suggesting insider trading via improbable investment success. Similarly, predictive policing, exemplified by programs in Santa Cruz, California, and Chicago, Illinois, uses probability to forecast crime. In Santa Cruz, a computer predicted a parking garage as a hotspot; police dispatched there arrested two women peering into cars, one with outstanding warrants and another carrying illegal drugs.

Predictive analytics applies broadly. From collecting and analyzing data, Allstate knows that 20- to 24-year-olds are most likely to have fatal crashes, that the most commonly stolen car in Illinois is the Honda Civic (though this does not mean individual Civics have the highest theft probability), and that laws banning texting while driving may worsen safety by causing drivers to hide their phones. J. P. Martin, a Canadian Tire executive, discovered that customers’ purchase behavior predicts payment likelihood. Buyers of cheap motor oil or chrome-skull car accessories tended to miss loan payments, while buyers of carbon-monoxide monitors almost never did.

Chapter 5.5 Summary: “The Monty Hall Problem”

The game show Let’s Make a Deal, which premiered in 1963, created a famous probability puzzle. At each show’s end, a contestant faced three doors: one concealing a car, two concealing goats. After the contestant chose a door, host Monty Hall opened one of the two unchosen doors, always revealing a goat. He then asked whether the contestant wanted to switch to the other remaining closed door. What should the contestant do?

The counterintuitive but statistically correct answer is to switch. Switching doors doubles the winning probability from 1/3 to 2/3. The key insight is that Monty knows where the car is. His action of revealing a goat provides valuable information, narrowing the odds of finding the car behind the remaining doors.

Three explanations support this conclusion. First, empirical evidence: New York Times columnist John Tierney wrote about the problem in 2008, and the Times created an interactive version. Wheelan paid his children to play 100 games each, one always switching doors and one always sticking with the original door; the switcher won 72 times, the non-switcher only 33. Leonard Mlodinow’s The Drunkard’s Walk notes that real contestants on the show who switched won about twice as often.

Second, an intuitive explanation: switching is equivalent to being offered both unchosen doors at the start. Since one must contain a goat, Monty revealing which one is simply a courtesy. If you initially picked Door 1 but were then offered whatever was behind Doors 2 and 3 together, you would accept, raising your winning chances to 2/3. Monty’s reveal of the goat behind one of those two doors does not diminish this advantage—he is showing you nothing you did not already know would be there.

Third is an extreme version that clarifies the logic. Imagine there are 100 doors. You pick one (1% chance of being correct). Monty then opens 98 other doors, all revealing goats, leaving only your door and one other closed. The car is almost certainly (99%) behind the remaining door, not your original pick.

The broader lesson is that intuition about probability can mislead.

Chapter 6 Summary: “Problems With Probability: How Overconfident ‘Math Geeks’ Nearly Destroyed the Global Financial System”

Before the 2008 financial crisis, Wall Street firms relied on the Value at Risk (VaR) model to measure risk. VaR assigned a dollar figure representing the maximum a firm could lose on investments over a specified period, with 99% probability. At the investment bank J.P. Morgan, the daily VaR calculation arrived on executives’ desks at 4:15 pm, offering a seemingly precise snapshot of risk across thousands of trading positions. Former New York Times business writer Joe Nocera explained VaR’s appeal: It expressed risk as a single number, a dollar figure accessible to non-quantitative executives.

However, VaR proved catastrophic. The models created false precision, like a faulty speedometer. Former US Federal Reserve chairman Alan Greenspan testified that the models were calibrated on roughly two decades of unusually buoyant markets; when those patterns broke, the models failed. Commercial banks assigned zero probability to large housing price declines—yet such declines occurred beginning in 2007.

A second critical flaw was that the 99% assurance ignored the devastating potential of the remaining 1%. Hedge fund manager David Einhorn likened VaR to an air bag that works until the moment of impact—precisely when protection is most needed. Former US Treasury Secretary Hank Paulson noted that firms unrealistically assumed they could sell assets during crises, ignoring that all other firms would also be trying to sell the same assets, equally simultaneously in need of cash. Nocera, summarizing Nicholas Taleb’s book The Black Swan (2007), emphasized that the greatest risks, often unforeseen events that lie outside standard models, occur more frequently than expected.

The Wall Street quants made three fundamental errors: confusing precision with accuracy, using flawed underlying probabilities based on atypical historical periods, and neglecting tail risk (or unlikely events). The resulting 2008 crisis destroyed trillions in wealth, drove unemployment to over 10%, and saddled governments with enormous debts.

Common probability errors include assuming events are independent when they are not. On a plane, if one jet engine fails, the second engine’s failure probability rises dramatically, since geese strikes or maintenance issues affect both engines similarly. In the 1990s, British prosecutors made a similar probability mistake about sudden infant death syndrome (SIDS). Assuming that all SIDS (or “cot death”) deaths were independent, Sir Roy Meadow testified that while one cot death in a family was a tragedy, two cot deaths in the same family was a 1-in-73-million event—and thus evidence of abuse. This reasoning sent innocent parents to prison. However, later, the Royal Statistical Society noted that genetic factors could link SIDS deaths, making a second death more, not less, likely after a first. In 2004, the British government announced it would review 258 cot death trials.

The opposite error—the gambler’s fallacy—occurs when independent events are wrongly treated as connected. A roulette ball landing on black five times does not make red more likely; the probability remains unchanged at 16/38. Research by Thomas Gilovich, Robert Vallone, and Amos Tversky refuted the “hot hand” belief—the idea that basketball players can be on a streak, with each successful shot increasing the odds of making the next. By analyzing Philadelphia 76ers and Boston Celtics data, they found no correlation between successive shots. Most fans perceive patterns where statistics show only randomness.

Another common error is not realizing that clusters happen by chance. While five people contracting rare leukemia at one workplace seems improbable (perhaps 1 in a million), millions of workplaces exist, making such clusters unsurprising. Wheelan often demonstrates this in his classroom. He has all students stand up and flip coins, with those who flip heads sitting down. Finally, one remains who has flipped six tails in a row. That student has no special talent; rather, someone always ends up with an unusual streak when enough people try.

The prosecutor’s fallacy involves neglecting statistical context. A DNA match with a one-in-a-million probability seems compelling, but if the suspect was identified by searching a database of millions of people, the match could easily be coincidental. The same DNA evidence holds vastly different meaning depending on whether the suspect was arrested near the crime scene holding the murder weapon or found through database screening.

Reversion to the mean explains why outlier performances return toward average. The so-called Sports Illustrated jinx holds that athletes who appear on the cover of the magazine are cursed with poor performance afterwards. In reality, the magazine features its cover subjects after exceptional stretches; their subsequent performance naturally reverts toward their typical level. Similarly, the Chicago Cubs repeatedly pay premium salaries for free agents after their career-best seasons, then watch performance decline.

Reversion to the mean differs from the gambler’s fallacy: While the probability of the next independent event remains constant, a long series of future events will average out closer to the mean. Research by Ulrike Malmendier and Geoffrey Tate found that when CEOs achieve “superstar” status by winning awards, their companies underperform over the next three years, partly due to the distraction that comes with being lauded: memoirs, board seats, and other commitments.

Statistical discrimination raises ethical questions about using probability related to group characteristics. In 2003, European commissioner Anna Diamantopoulou proposed banning gender-based insurance premiums, implemented in 2012. Insurers argued that men crash more and women live longer, making different rates for car and life insurance justified statistically. However, the EU decided that disparate treatment based on sex was unacceptable regardless of statistical validity.

Predictive analytics can identify drug smugglers or predict crimes with impressive accuracy, but what about the innocent people who fit the profile? The same analysis that reveals birdseed buyers default less on credit cards can be applied everywhere. The ability to analyze data has outpaced ethical consideration of how to use results. Statistics cannot be smarter than the people who use them, and critical thinking about what calculations to perform and why remains essential.

Chapters 4-6 Analysis

Charles Wheelan uses narrative case studies as a primary teaching tool, prioritizing theoretical understanding over mathematical formulas. In Chapter 4, the concept of the correlation coefficient is framed by the relatable example of Netflix’s recommendation algorithm. This choice grounds the statistic in a familiar application, demonstrating its power to find patterns in complex data in a way that readers can readily grasp. Similarly, Chapter 5 uses the 1981 Schlitz beer Super Bowl taste test to make binomial probability tangible. The marketing campaign shows how probability can transform a seemingly high-stakes gamble into a calculated, low-risk venture. By embedding statistical tools within engaging, real-world stories, Wheelan makes them more accessible, suggesting that Statistical Literacy Is Empowering and within the reach regardless of one’s ability to perform the calculations.

A central idea across these chapters is the conflict between human intuition and the logic of probability, underscoring Probability as a Tool for Better Decisions. The Monty Hall problem in Chapter 5.5 serves as a key example, isolating a famous instance where gut instinct is demonstrably wrong. Wheelan dismantles the intuitive assumption of which door to pick by providing empirical, logical, and extreme-case explanations, prompting the reader to abandon instinct in favor of a structured, probabilistic framework. This idea is reinforced in Chapter 6 through the deconstruction of common cognitive errors. The analysis of the “hot hand” in basketball reveals that “[p]eople’s intuitive conceptions of randomness depart systematically from the laws of chance” (103), showing a human tendency to perceive patterns where none exist. The gambler’s fallacy and the misinterpretation of cancer clusters further illustrate this cognitive bias. By repeatedly juxtaposing flawed intuition with statistical evidence, Wheelan makes a sustained argument for statistical literacy as a corrective to unreliable perception.

The analysis of the 2008 financial crisis provides a crucial pivot, transitioning the discussion from the utility of statistical tools to their potential for catastrophic misuse. The Value at Risk (VaR) model is presented as an instrument of abstraction, praised for its ability to express “risk as a single number, a dollar figure, no less” (96). This very appeal, however, became its greatest flaw, a way Statistics Can Mislead or Be Manipulated. Wheelan uses the VaR case study to deconstruct the confusion between precision and accuracy, showing how models based on flawed assumptions—in this case, an uncritical reliance on data from an unusually stable market period—create a false sense of security that masks risk. The critique of VaR’s failure to account for “tail risk,” or the likelihood of unlikely events, serves as a cautionary tale about the limitations of statistical modeling. Through this example, Wheelan frames statistical illiteracy as a potential threat to global economic stability.

The section culminates in an examination of the social and ethical dimensions of probability. The wrongful convictions in the British SIDS cases demonstrate the significant human cost of a single statistical error: assuming events are independent when they are not. This example moves the consequences of statistical malpractice from the financial realm to that of individual liberty and justice. The subsequent discussion of statistical discrimination confronts the philosophical challenges posed by predictive analytics. By showing how probabilistic models can lead to profiling, Wheelan presents the tension between statistical validity and social fairness. This progression—from Netflix recommendations to wrongful imprisonment to systemic discrimination—forms a deliberate structural arc. It argues that a complete statistical education involves both understanding the mechanics of probability and developing a critical awareness of its ethical implications.

blurred text

blurred text

blurred text

Unlock all 57 pages of this Study Guide

Get in-depth, chapter-by-chapter summaries and analysis from our literary experts.

Grasp challenging concepts with clear, comprehensive explanations
Revisit key plot points and ideas without rereading the book
Share impressive insights in classes and book clubs

Unlock Full Study Guide