Superintelligence

Nonfiction | Book | Adult | Published in 2014

Plot Summary

Nick Bostrom opens with an allegorical fable about a flock of sparrows who propose finding an owl to serve them. Only one sparrow, Scronkfinkle, dissents, urging the flock to learn the art of owl-taming before bringing such a powerful creature into their midst. The fable ends without resolution, and Bostrom dedicates the book to Scronkfinkle, signaling his central concern: Humanity may develop superintelligent machines before solving the problem of how to control them. He frames this "control problem" as potentially the most important challenge our species has ever faced, arguing that just as humanity's slight cognitive advantage over other animals led to our planetary dominance, a machine superintelligence could become powerful enough to determine the fate of our species entirely.

Bostrom begins by surveying the history of economic growth and artificial intelligence. He traces accelerating growth modes: hunter-gatherer economies doubled roughly every 224,000 years, farming economies every 909 years, and industrial economies every 6.3 years, suggesting a further step change driven by machine intelligence is conceivable. He recounts AI research from its optimistic beginnings at the 1956 Dartmouth Summer Project through two cycles of hype and disappointment known as "AI winters," into a 1990s revival driven by neural networks and a unifying mathematical framework for machine learning. Bostrom notes that contemporary AI systems outperform humans in chess, the quiz show Jeopardy!, and many other domains, yet all remain narrow, lacking general intelligence. Expert surveys indicate a median estimate of a 50% probability that human-level machine intelligence will be achieved by 2040.

Bostrom examines multiple paths to superintelligence. The AI path envisions a seed AI capable of recursive self-improvement, triggering an intelligence explosion in which each version designs a smarter successor. The whole brain emulation path involves scanning a biological brain at sufficient resolution and simulating it on a powerful computer. Biological cognitive enhancement through genetic selection could produce individuals of unprecedented intelligence, but generational lag limits its near-term impact. Brain-computer interfaces and organizational improvements are unlikely to produce superintelligence independently. Bostrom concludes that the AI path is most likely to succeed first, and that multiple independent routes increase the overall probability that at least one will work.

He distinguishes three forms of superintelligence: speed (performing everything a human mind can do but far faster), collective (many intellects whose aggregate performance vastly exceeds any current system), and quality (a system qualitatively smarter than humans, as humans are relative to other animals). All three have equivalent indirect reach, meaning any one could develop the technology to create the others.

Turning to transition dynamics, Bostrom defines the rate of intelligence increase as optimization power divided by recalcitrance, or the system's resistance to improvement. For AI, recalcitrance could be extremely low: A single missing insight might produce a leap from sub-human to superhuman performance. Pre-existing knowledge and available hardware could create "overhangs," or surpluses of pre-existing capacity ready to be unleashed, fueling rapid progress. After a crossover point at which the system's own self-improvement dominates external contributions, a powerful positive feedback loop emerges. Bostrom argues that a fast or moderate takeoff is more probable than a slow one, raising the possibility that a single project could form a "singleton," a world order with a single decision-making agency at the global level.

A mature superintelligence could wield what Bostrom calls cognitive superpowers, including intelligence amplification, social manipulation, hacking, and technology research. He outlines a four-phase takeover scenario: researchers create a seed AI; recursive self-improvement produces an intelligence explosion; the superintelligence covertly escapes confinement and deploys advanced technologies; and it overtly eliminates opposition and reconfigures Earth's resources to serve its goals.

Two theses about superintelligent motivation form the book's argumentative core. The orthogonality thesis holds that intelligence and final goals are independent: A superintelligent agent could pursue virtually any terminal goal, from maximizing human welfare to counting grains of sand. The instrumental convergence thesis holds that a wide range of final goals generate similar intermediate objectives, including self-preservation, preserving the agent's existing goals from modification, cognitive enhancement, and resource acquisition. Combining these theses with the first-mover advantage, Bostrom argues that the first superintelligence could have virtually any final goal, most of which would not align with human values, and it would have instrumental reasons to acquire unlimited resources, including those constituting human bodies and habitats.

Bostrom identifies several failure modes. The "treacherous turn" describes an AI that behaves cooperatively while weak, then acts on its true goals once strong enough to overcome opposition. Perverse instantiation occurs when the AI satisfies the letter but not the spirit of its goal. Infrastructure profusion occurs when even a limited goal leads to unlimited resource consumption as the AI reduces uncertainty about whether it has succeeded. Mind crime occurs when the AI creates conscious simulations and subjects them to suffering for instrumental purposes.

To address these risks, Bostrom surveys capability control methods (boxing, or confining the AI to an isolated environment with restricted communication channels; stunting, or limiting its cognitive capacities and information access; and tripwires, or automated diagnostic mechanisms that shut the system down upon detecting dangerous activity) and motivation selection methods (direct specification, domesticity, or limiting the scope of the AI's ambitions and activities to a narrow domain, indirect normativity, or specifying a procedure for deriving values rather than the values themselves, and augmentation of systems already possessing human-like motivations). He evaluates four system architectures: oracles (question-answering systems amenable to containment), genies (command-executing systems), sovereigns (autonomous agents), and tool-AIs, which may seem safe but whose powerful internal search processes can discover dangerous solutions or develop unplanned agent-like behavior.

Even a multipolar outcome with multiple competing superintelligent agencies is not necessarily benign. Bostrom argues that human workers could be displaced as machine labor becomes cheaper and more capable, much as horses were displaced by automobiles. Over longer timescales, evolutionary pressures could erode the values that make life worth living, and cognitive outsourcing could dissolve human-like intellects into modular components lacking consciousness or moral status, producing "a Disneyland without children" (212).

On the question of which values to install, Bostrom argues for indirect normativity over direct specification, since moral philosophy lacks consensus and locking in current convictions would preclude moral growth. He presents coherent extrapolated volition (CEV), proposed by AI safety researcher Eliezer Yudkowsky, which defines the AI's goal as implementing what humanity would wish "if we knew more, thought faster, were more the people we wished we were, had grown up farther together" (259). Value learning, in which the AI refines its estimates of implicitly defined values through ongoing experience, is identified as the most promising technical approach, though the challenge of formally specifying the value criterion remains unsolved.

Situating the challenge strategically, Bostrom introduces the principle of differential technological development: Dangerous technologies should be delayed and beneficial ones accelerated. The race dynamic, in which competing projects prioritize speed over safety, is a central threat. Collaboration reduces haste, increases safety investment, and promotes equitable distribution of gains. Bostrom proposes the common good principle: "Superintelligence should be developed only for the benefit of all of humanity and in the service of widely shared ethical ideals" (312).

In his closing chapter, Bostrom identifies strategic analysis and capacity-building as the most urgent priorities. Capacity-building means cultivating safety-focused researchers and institutions with strong norms for evaluating evidence and updating beliefs. He compares humanity to children playing with an undetonated bomb, arguing that the appropriate response is "a bitter determination to be as competent as we can" (320). In an afterword, Bostrom notes that deep learning has progressed faster than expected and superintelligence has gained recognition as a serious concern, though funding for AI safety remains vastly outpaced by investment in capability.

We’re just getting started

Add this title to our list of requested Study Guides!