While some problems in risk management have explicit analytic solutions, many problems have no exact mathematical solution. In these cases, we can often approximate a solution by creating a Monte Carlo simulation. A Monte Carlo simulation consists of a number of trials. For each trial we feed random inputs into a system of equations. By collecting the outputs from the system of equations for a large number of trials, we can estimate the statistical properties of the output variables.
Even in cases where explicit solutions might exist, a Monte Carlo solution might be preferable in practice if the explicit solution is difficult to derive or extremely complex. In some cases a simple Monte Carlo simulation can be easier to understand, thereby reducing operational risk.
In this chapter we will explore two closely related topics, confidence intervals and hypothesis testing. At the end of the chapter, we will explore applications, including value at risk (VaR).
Part V Hypothesis Testing and Confidence Intervals
The Sample Mean Revisited
Imagine we take the output from a standard random number generator on a computer, and multiply it by 100. The resulting data generating process (DGP) is a uniform random variable, which ranges between 0 and 100, with a mean of 50. If we generate 20 draws from this DGP and calculate the sample mean of those 20 draws, it is unlikely that the sample mean will be exactly 50. The sample mean might round to 50, say 50.03906724, but exactly 50 is next to impossible. In fact, given that we have only 20 data points, the sample mean might not even be close to the true mean.
The sample mean is actually a random variable itself. If we continue to repeat the experiment – generating 20 data points and calculating the sample mean each time – the calculated sample mean will be different every time. As we proved, even though we never get exactly 50, the expected value of each sample mean is in fact 50. It might sound strange to say it, but the mean of our sample mean is the true mean of the distribution. Using our standard notation:
(3.71)
Instead of 20 data points, what if we generate 1,000 data points? With 1,000 data points, the expected value of our sample mean is still 50, just as it was with 20 data points. While we still don't expect our sample mean to be exactly 50, we expect our sample mean will tend to be closer when we are using 1,000 data points. The reason is simple: a single outlier won't have nearly the impact in a pool of 1,000 data points that it will in a pool of 20. If we continue to generate sets of 1,000 data points, it stands to reason that the standard deviation of our sample mean will be lower with 1,000 data points than it would be if our sets contained only 20 data points.
It turns out that the variance of our sample mean doesn't just decrease with the sample size; it decreases in a predictable way, in proportion to the sample size. In other words, if our sample size is n and the true variance of our DGP is σ2, then the variance of the sample mean is:
(3.72)
It follows that the standard deviation of the sample mean decreases with the square root of n. This square root is important. In order to reduce the standard deviation of the mean by a factor of 2, we need four times as many data points. To reduce it by a factor of 10, we need 100 times as much data. This is yet another example of the famous square root rule for independent and identically distributed (i.i.d.) variables.
In our current example, because the DGP follows a uniform distribution, we can easily calculate the variance of each data point. The variance of each data point is 833.33, (100 − 1)2/12 = 833.33. This is equivalent to a standard deviation of approximately 28.87. For 20 data points, the standard deviation of the mean will then be 28.87/
= 6.45, and for 1,000 data points, the standard deviation will be 28.87/ = 0.91.We have the mean and the standard deviation of our sample mean, but what about the shape of the distribution? You might think that the shape of the distribution would depend on the shape of the underlying distribution of the DGP. If we recast our formula for the sample mean slightly, though:
(3.73)
and regard each of the (
)xi's as a random variable in its own right, we see that our sample mean is equivalent to the sum of n i.i.d. random variables, each with a mean of μ/n and a standard deviation of σ/n. Using the central limit theorem, we claim that the distribution of the sample mean converges to a normal distribution. For large values of n, the distribution of the sample mean will be extremely close to a normal distribution. Practitioners will often assume that the sample mean is normally distributed.Sample Problem
Question:
You are given 10 years of monthly returns for a portfolio manager. The mean monthly return is 2.3 percent, and the standard deviation of the returns series is 3.6 percent. What is the standard deviation of the mean?
The portfolio manager is being compared against a benchmark with a mean monthly return of 1.5 percent. What is the probability that the portfolio manager's mean return exceeds the benchmark? Assume the sample mean is normally distributed.
Answer:
There are a total of 120 data points in the sample (10 years × 12 months per year). The standard deviation of the mean is then 0.33 percent:
The distance between the portfolio manager's mean return and the benchmark is –2.43 standard deviations: (1.50 percent – 2.30 percent)/0.33 percent = –2.43. For a normal distribution, 99.25 percent of the distribution lies above –2.43 standard deviations, and only 0.75 percent lies below. The difference between the portfolio manager and the benchmark is highly significant.
Sample Variance Revisited
Just as with the sample mean, we can treat the sample variance as a random variable. For a given DGP if we repeatedly calculate the sample variance, the expected value of the sample variance will equal the true variance, and the variance of the sample variance will equal:
(3.74)
where n is the sample size, and κ is the excess kurtosis.
If the DGP has a normal distribution, then we can also say something about the shape of the distribution of the sample variance. If we have n sample points and
is the sample variance, then our estimator will follow a chi-squared distribution with (n – 1) degrees of freedom:where σ2 is the population variance. Note that this is true only when the DGP has a normal distribution. Unfortunately, unlike the case of the sample mean, we cannot apply the central limit theorem here. Even when the sample size is large, if the underlying distribution is nonnormal, the statistic in Equation 3.75 can vary significantly from a chi-squared distribution.
Confidence Intervals
In our discussion of the sample mean, we assumed that the standard deviation of the underlying distribution was known. In practice, the true standard deviation is likely to be unknown. At the same time we are measuring our sample mean, we will typically be measuring a sample variance as well.
It turns out that if we first standardize our estimate of the sample mean using the sample standard deviation, the new random variable follows a Student's t-distribution with (n – 1) degrees of freedom:
(3.76)