The difficulties we face in predicting the future is related to the problem of induction, a classic problem in philosophy. While data can certainly teach us a great deal about the workings of the world, the philosopher and sceptic David Hume made us realize that we cannot arrive at secure knowledge on the basis of empirical observations. The problem of induction says that no matter how many observations you obtain, you cannot know for sure that the observed pattern is going to hold in the future. This inherent limitation is at the heart of the Black Swan concept. Any knowledge obtained through observation, Taleb says, is fragile. It is what the Black Swan metaphor itself is meant to convey. Recall that millions of observations on white swans had seemingly verified the notion that all swans are white, and it only took one observation of a black one to falsify it. Along the same lines, Peter Bernstein (1996) observed in his epic story about risk that: ‘… history repeats itself, but only for the most part’2 (emphasis added). This sentence really sums it all up and explains why induction is treacherous ground for making assumptions about the future.
Once we capitulate to the fact that we cannot predict the future, the next best thing would be to be able to characterize randomness itself, i.e. describe it. In that way, we would have some idea about the scope for deviations from what we expect. A description of randomness would involve some degree of quantification of things like the range within which the values of a variable can be assumed to fall and how the outcomes are distributed within that range (frequencies). We might occasionally find such descriptions of random processes to be practically relevant insofar as they help us make informed decisions and our future wellbeing depends on the outcome of the variable in question. They are potentially helpful, for example, in coming up with a reasonable analysis of the trade‐off between risk and return in different kinds of investment situations.
When characterizing randomness, a useful first distinction is between uncertainty and known odds.3 Uncertainty simply means that the odds are not known, indeed cannot be known. When randomness is of this sort, there is no way of knowing with certainty the range of outcomes and their respective probabilities. Known odds, in contrast, means that we have fixed the range of outcomes and the associated probabilities. The go‐to example is the roll of a dice, in which the six possible outcomes have equal probabilities. Drawing balls with different colours out of an urn is another favourite textbook example of controlled randomness.
Uncertainty, it turns out, is what the world has to offer. In fact, known odds hardly exist outside man‐made games. This is the case for exactly the same reasons that forecasting is generally unsuccessful: there are some hard limits to our theoretical knowledge of the world.4 There is ample data, for sure, which partly makes up for it. But the world generates only one observable outcome at a time, out of an infinite number of possibilities, through mechanisms and interactions that are beyond our grasp. There is nothing to say that we should be able to objectively pinpoint the odds of real‐world phenomena. Whenever a bookie, for example, offers you odds on the outcome of the next presidential election, it is a highly subjective estimate (tweaked in favour of the bookie).
Whenever data exists, it is of course possible to try to use it to come up with descriptions of the randomness in a stochastic process. Chances are that we can ‘fit’ the data to one of the many options available in our library of theoretical probability distributions. Once we have, we have seemingly succeeded in our quest to describe randomness, or to turn it into something resembling known odds. This is the frequentist approach to statistical inference, in which observed frequencies in the data provide the basis for probability approximations. Failure rates for a certain kind of manufacturing process, for example, can serve as a reasonably reliable indication of the probability of failure in the future.
It is important to see, however, that even when we are able to work with large quantities of data, we are still in the realm of uncertainty. The data frequencies typically only approximate one of the theoretical distributions. What is more, the way we collect, structure, and analyse these data points determines how we end up characterizing the random process and therefore the probabilities we assign to different outcomes. To the untrained eye, they might seem like objective and neutral probabilities because they are data‐driven and obtained by ‘scientists’. However, there is always some degree of subjectivity involved in the parameterization. The model used to describe the process could end up looking different depending on who designs it. Hand a large dataset over to ten scientists and ask them what the probability of a certain outcome is, and you may well get ten different answers. Because of the problem of induction, as discussed, there is always the possibility that the dataset, i.e. history, is a completely misleading guide to the future. Whenever we approximate probabilities using data, we assume that the data points we use are representative for describing the future.
THE MOVING TAIL
At this point, we are ready to conclude that the basic nature of randomness is uncertainty. Known odds, probabilities in the purest sense of the word, are an interesting man‐made exception to that rule. If we accept that uncertainty is what we are dealing with, a natural follow‐up question is: What is uncertainty like? A distinction we will make in this regard is between ‘benign’ and ‘wild’ uncertainty.5 Benign uncertainty means that we do not have perfect knowledge of the underlying process that generates the outcomes we observe, but the observations nonetheless behave as if they conform to some statistical process that we are able to recognize. Classic examples of this are the distribution of things like height and IQ in a population, which the normal distribution seems to approximate quite well.
While the normal distribution is often highlighted in discussions about ‘well‐behaved’ stochastic processes, many other theoretical distributions appear to describe real‐world phenomena with some accuracy. There is nothing, therefore, in the concept of benign uncertainty that rules out deviations from the normal distribution, such as fat tails or skews. It merely means that the data largely fits the assumptions of some theoretical distribution and appears to do so consistently over time. It is as if we have a grip on randomness.
Wild uncertainty, in contrast, means that there is scope for a more dramatic type of sea change. Now we are dealing with outcomes that represent a clear break with the past and a violation of our expectations as to what was even supposed to be possible. Imagine long stretches of calm and repetition punctured by some extreme event. In these cases, what happened did not resemble the past in the least. Key words to look out for when identifying wild uncertainty is ‘unprecedented’, ‘unheard of’, and ‘inconceivable’, because they (while overused) signal that we might be dealing with a new situation, something that sends us off on a new path.
The crucial aspect of wild uncertainty is precisely that the tails of the distributions are in flux. In other words, the historically observed minimum and maximum outcomes can be surpassed at any given time. I will refer to the idea of an ever‐changing tail of a distribution as The Moving Tail. With wild uncertainty, an observation may come along that is outside the established range – by a lot. Such an event means that the tail of the distribution just assumed a very different shape. Put another way, there was a qualitative shift in the tail. Everything we thought we knew about the variable in question