Some of the first experiments on human cognition were designed to test non-trivial rules of rationality. It seems we assumed that humans were competent at reasoning and we only needed to worry about harder details. We were considered intelligent, if not perfect. If we failed, it would be at rules that seemed less natural and harder to follow. That was the general spirit of Allais’s (1953) and Ellsberg’s (1961) experiments. Those experiments were designed to show that we were not perfect at obeying a specific rule of Expected Utility Theory (EUT) (von Neumann and Morgenstern 1947). Both Allais and Ellsberg tested if, when people chose between games, the identical details in the games were ignored. EUT said that rule should be followed, but the experiments showed we did not obey them.
From early on, we were aware that humans make far simpler mistakes. Or, at least, we were aware that others commit those mistakes. We would certainly like to see ourselves as rational, as immune to trivial mistakes—at least when we take the time to think through a problem. The errors of others could, in principle, not be actual errors. It might be a character problem. Maybe those who disagreed with us knew how to reason but chose to make a few mistakes in order to fool those who are not as well educated. Maybe they were just not competent enough. People of low intelligence exist, after all. Whatever assumptions people actually made, we did see humanity as comprised of rational beings. If we assume people are rational when given the opportunity to think through a problem, most mistakes would happen on less important details. Testing easier problems would make sense, for completeness sake, at least. But, I suspect, humans were expected to succeed at those. The rules that were not so clear, even controversial, were the cases where we were most likely to find trouble.
That optimistic view of our reasoning abilities was not to last long, however. P. C. Watson and P. Johnson-Laird describe an interesting experiment on our ability to solve a trivial logical problem (Watson and Johnson-Laird 1972). The experiment consisted of showing four cards over a table to groups of volunteers. The volunteers were told that all the cards came from a deck where on one side of the card there was a letter and on the other side, a number. There was also a possible rule those cards might or might not obey. That rule was this: Whenever there was a vowel on the letter side, the number side would show an even number. Over the table, the volunteers could see the cards “E,” “K,” “4,” and “7”; and they had to answer a simple question. If you looked at the other side of those cards, which ones could provide proof that the rule was wrong?
The correct answer is, of course, cards “E” and “7.” If you get an odd number behind the letter “E,” the rule is false. And most people get this one card right with no difficulty. As the test was done, however, it turned out the majority tended to pick “E” and “4.” However, while the card “4” can provide an example of the rule working fine, it cannot provide an example of failure. It is as if people were looking for cases where the rule was confirmed, even when told to look for failures. Indeed, one of the explanations proposed for this experiment’s results is that we have a tendency to look for information that confirms our beliefs. We also avoid information that might show we are wrong. That effect is known as confirmation bias (Nickerson 1998).
Many mistakes have been observed since those initial experiments. And the list of our known biases keeps expanding. As we observe that list, a few themes start to show up. We seem to use fast heuristics quite often (Gigerenzer et al. 2000). They often provide correct answers but, as they exist for speed and not only accuracy, they can fail. Still, they make sense from both a practical as well as an evolutionary point of view. Solving a problem in a way that we can be reasonably sure of the answer can be mentally demanding. That would mean an increased use of energy, requiring more food. It could also mean devoting more time to think through that problem. Depending on the situation, we might not have that time nor the energy resources to devote to finding the actual best solution. If it might be a lion behind the bushes, waiting to be sure is a luxury we cannot afford. A fast heuristic that is reliable but not perfect might do a much better job at keeping us alive. The actual problem evolution had to solve involved not only the quality of the answer but also how fast we could have an answer and how much energy that would consume. In situations like those, instead of looking for the optimal solution, it makes sense to adopt “satisficing” strategies (Simon 1956).
The availability heuristic is a classic example of how that works (Tversky and Kahneman 1973). That heuristic claims that, when making judgments, we consider information that is more available in our minds and should be associated to more probable events. If you are a physician evaluating a patient, before you gather more information, it is more probable your patient has a common disease than a rare one. It makes sense to assume diseases you meet every day will be fresher in your memory. Those very rare cases will likely be hard to remember. The cases that are more easily available in your memory, thus, are likely to be the most probable ones.
While the ease with which you remember information is associated with how frequent it is, that is not the only factor. Dramatic cases make stronger memories, and they also make better news, which is why dramatic incidents appear to occur far more frequently in newspapers and on TV than in actual reality. Shark attacks get much more attention than home accidents, despite the fact sharks kill far less people. That does not mean that heuristics are a bad thing, or necessarily wrong. They do provide good initial guesses. Trusting those guesses as if intuitions were right, that is where we go wrong.
Heuristics are not the whole story though. Heuristics can explain some of the discrepancies between expected results and how we actually reason, but they do not account for everything. A second possible cause for those observed discrepancies is how many experiments were designed. The intention of several experiments was to check how well we obeyed EUT. So, they were planned with the tools of EUT in mind. That meant probability values and utilities. Money was used as a measure of utility, but no direct function was assumed in the experiments. The only suppositions were that people prefer to have more money than less and that they reasoned in a way that was equivalent to assigning some utility to their total assets. Probabilities, on the other hand, were presented in a straightforward way. The volunteers were often told the probability of each result in a gamble.
In the typical experiment, volunteers had to choose between two gambles. Those gambles had distinct chances attached to different amounts of money. By comparing several choices, researchers were able to determine the set of choices of a person was incompatible with the existence of a utility function. That meant the volunteers did not obey EUT. The natural question was in what ways were people departing from the normative choices?
Kahneman and Tversky realized that they could still use the EUT framework if they made a simple alteration (Kahneman and Tversky 1979). Maybe the subjects were not using the probability value given by the researchers. They assumed people altered the values they heard to less extreme values. And they observed such a correction could explain the differences between the observed behavior and EUT. It was as if when we tell people there is only a one-in-a-million chance they will win a lottery, they behave as if the odds are much better, maybe one in ten thousand, maybe even better chances. Indeed, if we observe the behavior of many people on improbable events, humans bet on the lottery as if it were much easier to win than it actually is.
That is, a weighting function on the probabilities could explain a good part of the observed behavior. But more recent studies have shown weighting functions cannot account for everything. We sometimes fail even at choosing an obviously better bet. That surprising decision was observed by Michael Birnbaum (2008). He observed that, depending on the details, people might prefer a gamble that was clearly worse. He asked volunteers to choose between similar bets: one of the possibilities, a 10 percent chance of winning $12, was split