Intuitively, we think that rational decision-making means exhaustively enumerating our options, weighing each one carefully, and then selecting the best. But in practice, when the clock—or the ticker—is ticking, few aspects of decision-making (or of thinking more generally) are as important as one: when to stop.
*We use boldface to indicate the algorithms that appear throughout the book.
*With this strategy we have a 33% risk of dismissing the best applicant and a 16% risk of never meeting her. To elaborate, there are exactly six possible orderings of the three applicants: 1-2-3, 1-3-2, 2-1-3, 2-3-1, 3-1-2, and 3-2-1. The strategy of looking at the first applicant and then leaping for whoever surpasses her will succeed in three of the six cases (2-1-3, 2-3-1, 3-1-2) and will fail in the other three—twice by being overly choosy (1-2-3, 1-3-2) and once by not being choosy enough (3-2-1).
*Just a hair under 37%, actually. To be precise, the mathematically optimal proportion of applicants to look at is 1/e—the same mathematical constant e, equivalent to 2.71828…, that shows up in calculations of compound interest. But you don’t need to worry about knowing e to twelve decimal places: anything between 35% and 40% provides a success rate extremely close to the maximum. For more of the mathematical details, see the notes at the end of the book.
*More on the computational perils of game theory in chapter 11.
Your stomach rumbles. Do you go to the Italian restaurant that you know and love, or the new Thai place that just opened up? Do you take your best friend, or reach out to a new acquaintance you’d like to get to know better? This is too hard—maybe you’ll just stay home. Do you cook a recipe that you know is going to work, or scour the Internet for new inspiration? Never mind, how about you just order a pizza? Do you get your “usual,” or ask about the specials? You’re already exhausted before you get to the first bite. And the thought of putting on a record, watching a movie, or reading a book—which one?—no longer seems quite so relaxing.
Every day we are constantly forced to make decisions between options that differ in a very specific dimension: do we try new things or stick with our favorite ones? We intuitively understand that life is a balance between novelty and tradition, between the latest and the greatest, between taking risks and savoring what we know and love. But just as with the look-or-leap dilemma of the apartment hunt, the unanswered question is: what balance?
In the 1974 classic Zen and the Art of Motorcycle Maintenance, Robert Pirsig decries the conversational opener “What’s new?”—arguing that the question, “if pursued exclusively, results only in an endless parade of trivia and fashion, the silt of tomorrow.” He endorses an alternative as vastly superior: “What’s best?”
But the reality is not so simple. Remembering that every “best” song and restaurant among your favorites began humbly as something merely “new” to you is a reminder that there may be yet-unknown bests still out there—and thus that the new is indeed worthy of at least some of our attention.
Age-worn aphorisms acknowledge this tension but don’t solve it. “Make new friends, but keep the old / Those are silver, these are gold,” and “There is no life so rich and rare / But one more friend could enter there” are true enough; certainly their scansion is unimpeachable. But they fail to tell us anything useful about the ratio of, say, “silver” and “gold” that makes the best alloy of a life well lived.
Computer scientists have been working on finding this balance for more than fifty years. They even have a name for it: the explore/exploit tradeoff.
Explore/Exploit
In English, the words “explore” and “exploit” come loaded with completely opposite connotations. But to a computer scientist, these words have much more specific and neutral meanings. Simply put, exploration is gathering information, and exploitation is using the information you have to get a known good result.
It’s fairly intuitive that never exploring is no way to live. But it’s also worth mentioning that never exploiting can be every bit as bad. In the computer science definition, exploitation actually comes to characterize many of what we consider to be life’s best moments. A family gathering together on the holidays is exploitation. So is a bookworm settling into a reading chair with a hot cup of coffee and a beloved favorite, or a band playing their greatest hits to a crowd of adoring fans, or a couple that has stood the test of time dancing to “their song.”
What’s more, exploration can be a curse.
Part of what’s nice about music, for instance, is that there are constantly new things to listen to. Or, if you’re a music journalist, part of what’s terrible about music is that there are constantly new things to listen to. Being a music journalist means turning the exploration dial all the way to 11, where it’s nothing but new things all the time. Music lovers might imagine working in music journalism to be paradise, but when you constantly have to explore the new you can never enjoy the fruits of your connoisseurship—a particular kind of hell. Few people know this experience as deeply as Scott Plagenhoef, the former editor in chief of Pitchfork. “You try to find spaces when you’re working to listen to something that you just want to listen to,” he says of a critic’s life. His desperate urges to stop wading through unheard tunes of dubious quality and just listen to what he loved were so strong that Plagenhoef would put only new music on his iPod, to make himself physically incapable of abandoning his duties in those moments when he just really, really, really wanted to listen to the Smiths. Journalists are martyrs, exploring so that others may exploit.
In computer science, the tension between exploration and exploitation takes its most concrete form in a scenario called the “multi-armed bandit problem.” The odd name comes from the colloquial term for a casino slot machine, the “one-armed bandit.” Imagine walking into a casino full of different slot machines, each one with its own odds of a payoff. The rub, of course, is that you aren’t told those odds in advance: until you start playing, you won’t have any idea which machines are the most lucrative (“loose,” as slot-machine aficionados call it) and which ones are just money sinks.
Naturally, you’re interested in maximizing your total winnings. And it’s clear that this is going to involve some combination of pulling the arms on different machines to test them out (exploring), and favoring the most promising machines you’ve found (exploiting).
To get a sense for the problem’s subtleties, imagine being faced with only two machines. One you’ve played a total of 15 times; 9 times it paid out, and 6 times it didn’t. The other you’ve played only twice, and it once paid out and once did not. Which is more promising?
Simply dividing the wins by the total number of pulls will give you an estimate of the machine’s “expected value,” and by this method the first machine clearly comes out ahead. Its 9–6 record makes for an expected value of 60%, whereas the second machine’s 1–1 record yields an expected value of only 50%. But there’s more to it than that. After all, just two pulls aren’t really very many. So there’s a sense in which we just don’t yet know how good the second machine might actually be.
Choosing a restaurant or an album is, in effect, a matter of deciding which arm to pull in life’s casino. But understanding the explore/exploit tradeoff isn’t just a way to improve decisions about where to eat or what to listen to. It also provides fundamental insights into how our goals should change as we age, and why the most rational course of action isn’t always