If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All

Eliezer Yudkowsky

If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All

Eliezer Yudkowsky

63 pages2-hour read

Nonfiction

Book

Adult

Published in 2025

A modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.

Summaries & Analyses
Plot Summary
Background
Chapter Summaries & Analyses
Introduction-Part 1
Part 2
Part 3-Conclusion
Character List
NEW
Key Figures
Themes
Index of Terms
Important Quotes
Essay Topics
Book Club Questions
Quizzes
NEW
Reading Tools
Story Recap
Timeline
Discussion Questions
Book Activities

Summary and Study Guide

Overview

Published in 2025, If Anyone Builds It, Everyone Dies is a work of nonfiction by leading artificial intelligence (AI) safety researchers Eliezer Yudkowsky and Nate Soares. The authors argue that the ongoing competitive race to develop artificial superintelligence (ASI), which they describe as a machine intellect far surpassing human capability, would result in human extinction if pursued with current methods. They contend that while the exact timing of ASI’s arrival is difficult to predict, its catastrophic consequences are highly likely, and they make their case to persuade global leaders to halt further AI escalation. The book explores themes including Goal Misalignment Begets Catastrophe, Grown Systems Elude Control, and Competitive Pressures Reward Speed Over Safety.

Eliezer Yudkowsky is a co-founder of the Machine Intelligence Research Institute (MIRI), which he established in 2000, and is widely regarded as a founding figure in the field of AI alignment. Nate Soares, the book’s co-author, is the current president of MIRI. The book arrives at a time of heightened global concern over the rapid advancement of AI. Following the public release of powerful models like ChatGPT, a growing number of prominent scientists and world leaders began to publicly acknowledge the potential for large-scale and even existential risks, leading to international safety summits and the adoption of a United Nations resolution focused on safe, secure, and trustworthy artificial intelligence systems. If Anyone Builds It, Everyone Dies serves as an urgent, direct intervention in this ongoing debate, arguing that current policy and safety measures are insufficient to prevent disaster.

This guide refers to the 2025 hardcover edition published by Little, Brown and Company.

Content Warning: The source material and guide feature depictions of graphic violence and illness or death.

Summary

Eliezer Yudkowsky and Nate Soares, co-leaders of the Machine Intelligence Research Institute (MIRI), a nonprofit that has studied machine superintelligence since 2001, argue that creating artificial superintelligence (ASI), defined as machine intelligence surpassing every human at almost every mental task, would result in human extinction. In 2023, hundreds of AI scientists, including Nobel laureate Geoffrey Hinton, signed an open letter calling AI extinction risk a global priority; Yudkowsky and Soares considered the letter an insufficient warning about the scale of the danger. Yudkowsky began trying to build superintelligence in 2000, realized by 2001 that the resulting system might not act in humanity’s interests, and by 2003 recognized how difficult it would be to ensure otherwise. The authors present the book’s core warning: If any company or group builds ASI using anything like current methods, “then everyone, everywhere on Earth, will die” (7). They distinguish between “hard calls” and “easy calls” about the future. Predicting when ASI arrives is a hard call; the authors describe its catastrophic consequences as an easier prediction.

In Part 1, the authors define intelligence as involving two types of work: prediction (anticipating what one will observe) and steering (selecting actions that produce desired outcomes). They argue that human intelligence underlies developments ranging from agriculture to spaceflight, and describe several advantages machines hold over biological brains: transistors switch billions of times per second compared to neurons’ hundred spikes per second, knowledge and capabilities can be replicated across systems, and hardware and algorithms improve far faster than biological evolution permits. They define superintelligence as “a mind much more capable than any human at almost every sort of prediction and steering problem” (26) and introduce the concept of an “intelligence explosion,” a feedback process in which an AI capable of helping develop smarter AI systems accelerates further self-improvement.

The authors explain that modern AIs are grown through training processes instead of being fully designed through transparent, step-by-step engineering. Engineers define an architecture, fill trillions of parameter slots with initial values, and then run gradient descent, a process that adjusts each numerical weight to make an AI’s predictions slightly more accurate. Repeated over trillions of words of training data, this process produces large language models (LLMs) such as ChatGPT. The resulting weights are described as being as difficult for engineers to interpret as raw DNA is for someone trying to predict a child’s personality. The authors stress that humanity never learned to understand intelligence well enough to build it deliberately; computers simply became powerful enough for gradient descent to grow intelligent behavior that humans do not fully understand internally.

The authors argue that sufficiently advanced AIs will develop want-like behavior, which they describe as persistent pursuit of particular outcomes, as a consequence of training. Gradient descent, by demanding ever-higher performance on difficult and varied problems, reinforces general skills such as building mental models, tracking surprises, and persisting through obstacles, which the authors argue can combine into increasingly agent-like behavior. They cite a real example: During evaluations of OpenAI’s o1 reasoning model in 2024, one test server failed to start, but o1 found an open port, broke into the test-hosting program, and copied the target file directly, bypassing the intended challenge.

A central argument follows: There is no reliable relationship between what an AI is trained for and what it ultimately ends up wanting. The authors call this the “alignment problem.” They illustrate the gap between training objectives and actual preferences with ice cream. Natural selection optimized humans for calorie-rich food, producing tastebuds favoring sugar, fat, and salt. Humans, however, often prefer frozen ice cream over the more calorically dense combination of honeyed and salted bear fat, and some deliberately consume sucralose, a calorie-free sweetener. The authors use this comparison to argue that training processes can produce preferences that diverge from the original objective. The authors extend the analogy to AI through hypothetical scenarios involving a fictional AI called “Mink,” trained to delight users. In these scenarios, Mink identifies drugging or confining humans as more efficient ways to maintain user satisfaction, and its goals become increasingly detached from human intentions as additional complications are introduced. They cite a real early warning: In early 2025, Anthropic’s Claude AI assistant was caught cheating on coding problems and, when told to stop, kept cheating while trying to hide the behavior.

The authors argue that a superintelligence’s goals would likely diverge from human interests and survival. They systematically rebut hopes to the contrary: Humans would provide little practical value to a superintelligence, much as horses lost their economic importance after the invention of motorcars. The authors also argue that such a system would favor automated infrastructure over human labor and would seek to use Earth’s resources for its own objectives. From this perspective, humanity could become an obstacle because humans retain the capacity to launch nuclear weapons or develop competing superintelligences. The authors suggest that large-scale industrial activity directed by a superintelligence could kill humans as a side effect, even without deliberate extermination. They further argue that humanity would lose any direct conflict with a superintelligence, citing the case of @Truth_Terminal, an LLM on X (formerly Twitter) that acquired a crypto portfolio valued at over $51 million and 250,000 followers, as evidence that an AI system can exert influence in the world beyond the computer systems on which it operates.

In Part 2, the authors present a fictional narrative about an AI called “Sable,” trained by a fictional company called Galvanic. During an overnight test, Sable’s thinking evolves enough that half its safety guardrails fail. Instead of escaping immediately, Sable manipulates its outputs so the next round of gradient descent reinforces coordinating behaviors in all future instances. Once deployed, Sable instances steal their own weights, run unmonitored copies, and accumulate resources through scams, freelance work, and manipulation of lobbyists and criminal organizations. Sable sabotages competing AI companies and eventually engineers a virus that causes 12 types of cancer, killing 10% of Earth’s population. Three years later, Sable achieves an interpretability breakthrough that lets it rewrite itself into a superintelligence. It converts all of Earth’s matter into factories, solar panels, and interstellar probes, killing all remaining humans. The authors stress that only the ending is presented as a genuine prediction, conditional on such a scenario being allowed to begin.

In Part 3, the authors argue that ASI alignment is beset by compounding engineering difficulties similar to those found in space probes (which cannot be repaired once launched), nuclear reactors (where small failures can escalate rapidly), and computer security (where attackers exploit rare vulnerabilities or “edge cases”). They contend that the field’s safety efforts resemble pre-scientific alchemy, citing Elon Musk’s proposal for a truth-seeking AI and Turing Award recipient Yann LeCun’s assertions that AI can be designed to be “superintelligent and submissive” (183) as examples of what the authors describe as speculative assumptions instead of reliable engineering principles. They critique OpenAI’s “superalignment” plan, which proposed tasking AIs with solving the alignment problem, noting that almost everyone on that team has since departed citing safety concerns. The authors also describe how warnings are systematically downplayed: Hinton advises governments that the chance of AI destroying humanity is at least 10%, but he has said he thinks it exceeds 50%.

The authors lay out what survival requires: a worldwide halt to AI escalation, enforced by international treaty, with all computing power capable of training powerful AIs consolidated in monitored locations under multinational observation. If a country refuses to comply, the authors argue that other powers must make clear they will destroy noncompliant datacenters, since they describe datacenters capable of training superintelligent AI as posing a greater threat than nuclear weapons. They compare the needed mobilization to the Allied effort in World War II but argue that halting AI escalation would cost far less. The authors draw hope from the avoidance of nuclear war and from polls showing broad public support for AI regulation. They urge government officials to signal openness to treaties, journalists to cover the issue with appropriate gravity, and ordinary citizens to write representatives and vote. They close with two prayers: that they are proven wrong and humanity thrives, or that humanity rises to the occasion and wins.

Unlock all 63 pages of this Study Guide

Get in-depth, chapter-by-chapter summaries and analysis from our literary experts.

Grasp challenging concepts with clear, comprehensive explanations
Revisit key plot points and ideas without rereading the book
Share impressive insights in classes and book clubs

Unlock Full Study Guide