# 1. Introduction

Insurance companies run capital models to quantify risk, assess capital adequacy, allocate capital, and assist in enterprise risk management. For such purposes, a capital model needs to capture all risk sources that may cause large swings in financial results. One-year capital models calculate changes in financial variables during one calendar year. Although multiyear models should provide a more complete view on capital usage, one-year models are more common because of their ease of implementation. For a property and casualty (P&C) insurance company, catastrophic events and investment volatilities are usually the main considerations in a one-year model. Catastrophic events—of which earthquake and hurricane are the worst perils—are unpredictable in timing and may affect many policyholders at once and produce huge losses. A company typically invests its assets in stocks, bonds, and other financial securities. Market values of securities can vary wildly, and bonds can default. In a short time horizon, catastrophe losses and investment losses are the greatest threat to a company’s net worth. Over a longer term, however, inadequate pricing, adverse reserve development, or strategic and operational failures may cause greater damage. Not being able to capture fully the effects of those risks is a limitation of one-year models. Nonetheless, most of the concepts and mathematical formulas we discuss in the paper are applicable in multiyear models.

We study the probability distributions of catastrophe and investment losses, and their joint impact on the one-year change in net worth. Other sources of risk, including adverse development of loss reserves, rate inadequacy, market cycle, and large fire or other non-catastrophe claims, are also important. During a single year, however, such risks have relatively small volatilities. Their contributions to tail values of company net worth are insignificant. When we calculate the required sample size related to tail risk measures, such risk sources can be ignored.

We are interested in stochastic models as opposed to static, scenario-based models. A one-year stochastic model produces probability distributions of financial variables at the end of a year. To quantify risk, we use such statistical measures as variance, standard deviation, value at risk (VaR), or tail value at risk (TVaR). The latter two are called tail risk measures. Because capital is usually considered a cushion for preventing insolvency, VaR and TVaR are more common risk measures in capital modeling.

Statistical measures can rarely be calculated with mathematical formulas. They are typically estimated through simulation. To run a simulation is to draw random samples simultaneously from all random variables contained in a model. When the sample size is large, the probability distribution of a sample is near the distribution of the original random variable. So a statistical measure of the sample can be used as an estimate of the same measure of the original variable (e.g., a sample VaR as an estimate of the VaR of the original variable). The larger the sample size, the better the approximation.

We usually have some expectation regarding the accuracy of an estimation, although it is not always explicitly spelled out. For example, economic capital may be defined by a VaR measure on a modeled surplus loss distribution. We may want to ensure, with a probability close to 1, that a sample VaR is less than $100 million off the true VaR (the true VaR, of course, is unknown). If, instead, a sample VaR can easily deviate from the true VaR by $100 million, then two back-to-back model runs with different seeds can produce estimations with a $200 million difference. Many of us working in capital modeling have been asked to explain why the modeled economic capital can vary so much even though the underlying exposures have changed little. We may attribute it to sampling error and to the limitation of computing resources that does not allow larger samples. However, such explanations are weak if we cannot quantify the error range or the necessary sample size.

In this paper, we study the problem of required sample size for achieving a given precision with a given (high) probability. Our method is to use mathematical theories about distributions of sample measures. For statistical measures we are interested in, including the mean, VaR, and TVaR, when the sample size is large, sample measures are distributed normally around the targeted true measures. Variances of the normal distributions shrink as the sample size increases, giving us more accurate estimations. A threshold of sample size can be found using the normal distributions.

It is well known that capital modeling necessitates a much larger sample size than many other fields. This is mostly due to the skewness and fat tail of probability distributions of catastrophe losses and the length of return periods (e.g., 1 in 1,000 years or 1 in 2,000 years) we use to quantify the economic capital. Capital modelers routinely run 100,000 trials. Our calculation shows that even a sample of that size sometimes is not enough.

We offer some general comments on modeling to put the present study into perspective. Capital modeling, or any stochastic modeling for that matter, is a two-step process. The first is to construct a model to fit the reality, and the second is to generate sample points to approximate the model. The first step is difficult and often results in a mismatched model. That is known as the model risk. But that topic is beyond of the scope of the paper. We take a model as given and attempt to quantify and control the sampling error. Constructing a model is equivalent to specifying a set of random variables with interactions among them. To estimate a statistical measure of the model, we use a corresponding sample measure. For example, a VaR of a catastrophe loss is approximated by a corresponding sample VaR. Sample VaRs with various seeds form a probability distribution around the true VaR. The distribution is determined by the shape of the catastrophe loss distribution. It is not, however, affected by the specific way the catastrophe model is constructed (i.e., random variables and dependencies used). Therefore, this study does not refer to any specific model construction. We also assume that running simulations produces truly random samples, although actual simulation engines may have slight imperfections.

The paper is divided into seven sections. In Section 2 we single out two risk drivers—investment loss and catastrophe loss—for further study. Mathematical theories on the normal approximation of statistical measures are reviewed in Section 3. The theories are applied to sample-size questions related to investment and catastrophe risks in Section 4. Some extensions to simultaneous estimation of two risk measures are given in Section 5. In Section 6 we illustrate the dominating effect of catastrophe losses in tail events. We conclude the paper in Section 7, where we suggest a method of increasing the sample size and discuss problems for further study.

# 2. Sources of Risk in a Capital Model

In this section we break down a capital model into components and look at their respective risk characteristics. A one-year capital model produces sample points for the one-year change in every financial variable. Investment gains and underwriting gains are equally important variables for evaluating economic capital. They drive the volatility in modeled surplus change. (We are not concerned with specific accounting rules. Surplus here may mean the U.S. statutory surplus, U.S. GAAP [generally accepted accounting principles] equity, or other measures of net worth.) So a capital model, at the highest level, should include two components—investments and underwriting. We typically model many classes of assets. For each class, the model generates a random variable representing its one-year change in market value. The random variables are usually correlated, and they sum up to the total investment gain, denoted by IG. On the underwriting side, we model several lines of business. The one-year underwriting gain (earned premiums minus the sum of incurred losses and incurred expenses) of each line is a random variable, and they sum up to the total underwriting gain, denoted by UG. The sum, SG = IG + UG, is the one-year change in surplus. The economic capital is usually defined by applying some risk measure on the probability distribution of surplus change.

Our main endeavor in this paper is to find a lower bound of sample sizes that, in a probabilistic sense, ensures a low sampling error when estimating a statistical measure. Such lower bounds, as will be seen in Section 3, depend on the shape of the probability distribution and the specific measure. In this paper, we discuss estimation of mean, VaR, and TVaR as applied to investment gains and underwriting gains. VaR and TVaR are measures of tail risk, as opposed to variance or standard deviation, which quantify the overall volatility.

Note that when we speak of a probability distribution of investment gain or underwriting gain, or any financial variables, we refer to a distribution generated by a model, not one in the real world. Whether the model fits the real world is out of our scope. For any financial quantity, different model developers may build different models, which produce different distributions. To discuss a feature of a financial variable in general terms, say the tail behavior of catastrophe loss, we imply that it is a common feature regardless of the model. Fortunately, it is often the case that distributions of a financial variable from well-built models all have similar shapes and key features. This is not surprising given that model developers likely use the same theories and the same empirical data. Thus, we may discuss a financial variable without referring to a particular model.

It is common for an insurance company to build some models itself and to purchase others in the software market. Investment models, given the large number of variables with complex dynamics and correlations, are usually purchased from specialized software vendors. A good investment model is capable of simulating interest rates, bond spreads, defaults, and market value changes. Typically, in an insurance company, the investment portfolio is well diversified among bonds and stocks. Most models would produce a near-normal distribution for the one-year total return of such a portfolio. This is consistent with one of the *stylized features* of financial returns called *aggregational Gaussianity* (Cont 2001). Recent empirical studies on stock market returns continue to support the normality (Egan 2007) (Hebner 2014). Bond returns have more skewness (to the left), probably because of the value loss at default (Rachev, Menn, and Fabozzi 2005). But they can still be described as nearly normally distributed with slight skewness and a fat tail. Note that daily returns of stocks or bonds are known to be highly skewed and fat-tailed. As the time horizon increases, however, the central limit theorem sets in and distributions of returns become more normal-like. So, for one-year returns, near-normal distributions are conceptually acceptable. The normal distribution has the property that VaRs in the deep tail do not deviate far (measured by multiples of the standard deviation) from the mean. In this sense we say normal distributions do not have great tail risk.

For P&C insurance companies with a sizable property book, extremely large underwriting losses are usually produced by catastrophic events. Probability distributions of catastrophe losses are defined by catastrophe models. Few insurance companies build their own stochastic catastrophe models, especially for rarer perils, such as hurricane and earthquake, for which historical loss data are sparse. Most employ one or more third-party models. Distributions of catastrophe losses are highly skewed toward large losses. The likelihood of occurrence of large losses decays slowly as loss increases. This property is called the fat tail and is characteristic of high-risk risk sources.

In contrast, other underwriting components, including reserve change, premiums, expenses, and non-catastrophe losses, vary with much smaller magnitude within one year. They have small standard deviations and do not have fat tails. Since our concern is the tail risk of the underwriting gain, catastrophe must be the dominating component. Let CL represent the annual catastrophe loss. Then the one-year underwriting gain is UG=−CL+other variables, where the “other variables” term equals earned premium minus the sum of incurred expenses, non-catastrophe losses, and reserve development. The left tail of UG and the right tail of CL have similar shapes and decay at similar rates. This says that the required sample size for approximating VaR or TVaR of UG (in its left tail) almost equals that for approximating corresponding VaR or TVaR of CL (in its right tail). Therefore, the only underwriting component we need to examine is the catastrophe loss.

In parallel with catastrophe loss we define the investment loss as the negative investment gain, IL=−IG. Thus, the right tails of IL and CL both represent large losses. The variable IL+CL equals the negative change in surplus (or surplus loss) minus some variables with relatively small variations. We omit the less important variables and write the “reduced” surplus loss as SL=IL+CL. These three loss distributions are the focus of the rest of the paper.

# 3. Mathematical Preliminaries

Our calculation of required sample size is based on the normal approximation—that is, as the sample size approaches infinity, the probability distribution of a sample statistic approaches a normal distribution. There is a large volume of mathematical work on the normal approximation. We review some results in this section and apply them later to solve sample-size problems. As explained in the previous section, random variables in this paper represent losses. A loss random variable may take positive or negative values, as underwriting loss or investment loss does. The right tail of a loss distribution represents large losses, which determine risk. For technical reasons we always assume that losses are continuous random variables, which means they have continuous cumulative distribution functions (CDFs). Additional assumptions are also needed for some results, including that losses have continuous probability density functions and have finite variances.

We use E(X), V(X), and SD(X) to denote the mean, the variance, and the standard deviation of X. The value at risk of X at a tail probability p (meaning p is near 1) is denoted by VaRp(X). It is more commonly known as the (100p)th percentile of X. VaRp(X) is sometimes simplified as xp. The tail value at risk at p, denoted by TVaRp(X), is the expected value of X under the condition X>xp, i.e., TVaRp(X)=E(X|X>xp). V(X), SD(X), VaRp(X), and TVaRp(X) are some of the most common risk measures (Venter 2003; Rachev, Menn, and Fabozzi 2005; Hardy 2006).

We now introduce notations for some sample statistics. Let X1, X2, …, Xn be a random sample of size n, drawn from the distribution of X. The sample mean is ¯Xn=∑ni=1Xi/n. To define sample VaR and sample TVaR we rearrange the sample points in an ascending order, X(1)<X(2)<…<X(n). Then for the sample distribution we may define the VaR at a tail probability p to be an X(i) for some i near np. Precise definitions of sample VaR vary slightly in publications (Cramer 1957; Hardy 2006). When n is large all the definitions produce almost equal values. For convenience we simply assume np is an integer and the sample VaR is X(np). Accordingly, the sample TVaR at p is defined as ∑ni=np+1X(i)/n(1−p), and is denoted by ˆtp,n.

A sample statistic is generally a good approximation of the corresponding measure of the original distribution if the sample size is large. Mathematically, we say that the sample measures *converge* to the true measures as sample size approaches infinity, which has been rigorously proved for the mean, VaR, TVaR, and many other statistical measures. Convergence in a probabilistic sense is a more complex concept than convergence of a number sequence. There are several ways of defining it. A sequence of random variables Yn, n=1, 2, …, is said to converge *in probability* to a constant c, if for any ϵ>0, limn→∞P(|Yn−c|>ϵ)=0. Convergence to a constant is easy to understand. But for our purpose a more sophisticated concept is necessary. A sequence of random variables Yn is said to converge *in distribution* to a random variable Y, if limn→∞P(Yn≤y)=P(Y≤y) at every point y, which is a continuity point of the CDF of Y. In this case the CDF of Y is called the limiting distribution. These concepts can be found in standard textbooks (Hogg and Craig 1978).

Consider each sample point Xi as a random variable. Then all Xis have the same distribution as the original X and are independent. The sample measures ¯Xn, X(np), and ˆtp,n=∑ni=np+1X(i)/n(1−p) are also random variables. Below are convergence results about sample means. (We assume all technical conditions required for mathematical proofs hold.)

*Convergence of sample means*. As n→∞, ¯Xn=∑ni=1Xi/n converges in probability to the mean of X, μ=E(X). In addition, √n(¯Xn−μ) converges in distribution to a normal variable N(0,σ2), where σ2=V(X) is the variance of X.

The first statement is the familiar law of large numbers. It says that ¯Xn is a good estimate for μ when n is large. The second statement is the central limit theorem. It is a stronger result as it gives a limiting distribution. Both theorems can be found in many textbooks (Hogg and Craig 1978).

The limiting normal distribution is the basis for calculating the required sample size. In capital modeling we routinely run a large number of trials, 100,000 or more. This is necessary because our models include highly skewed distributions, like catastrophe losses, and we are interested in VaR and TVaR in the deep tail. With a sample size so large, the limiting distribution N(0,σ2) is a good approximation to the distribution of the random variable √n(¯Xn−μ). It implies that the estimation error ¯Xn−μ is normal with variance σ2/n. If we increase n, then the variance decreases. There is a threshold N so that for any sample size greater than N, the error is within, in a probabilistic sense, a given bound. In notation, our task is, for a given error bound a>0 and a confidence level r near 1, to find a large integer N so that when n>N, P(|¯Xn−μ|≤a)≥r. As a common practice, we choose r=0.95.

It is known that if Y is a standard normal variable then P(|Y|≤1.96)=0.95. If Y is distributed as N(0,σ2), then P(|Y|≤1.96σ)=0.95. So, for n large enough, the following equation holds with good precision: P(|√n(¯Xn−μ)|≤1.96σ)=0.95. Solving the equation 1.96σ/√n=a for n gives us a threshold N,

N=(1.96σa)2.

This formula is easy to understand intuitively. A greater σ2 means sample points are more dispersed, so the sample mean is less accurate. To achieve a given accuracy level, more sample points are needed. The formula is identical to the full credibility standard for severity—see formula (2.3.1) in Mahler and Dean (2009). Note that, in addition to the variance, other characteristics of the distribution F(x) may potentially influence the threshold, like higher moments or tail thickness. They do not appear in equation (1) but may affect its validity. If the distribution of X is skewed and fat-tailed, a larger n may be needed for the distribution of √n(¯Xn−μ) to be approximately normal. This seems to be a minor issue as the sample size in capital modeling is generally very large.

The following is a similar result for VaR.

*Convergence of sample VaRs*. Let xp=VaRp(X). As n→∞, √n(X(np)−xp) converges in distribution to N(0,σ2), where σ2=p(1−p)/f(xp)2 and f(x) is the probability density function of X.

This is a classic result, whose proof can be found in Chapter 28 of Cramer (1957). For a recent discussion, see Manistre and Hancock (2005). It is interesting to note that the statement is very similar to the central limit theorem, although X(np) is not a sum of independent variables.

We now apply the same argument as that used in deriving equation (1). First, substitute the limiting distribution N(0,σ2) for the distribution of √n(X(np)−xp). Then, for a given error bound a and a tail probability p, we solve for the threshold N. The result is as if plugging in the expression of σ2 into equation (1):

N=(1.96af(xp))2p(1−p).

It is no surprise that the density function f(xp) appears in equation (2). Recall that a quantile function is defined as G(p)=F−1(p), where F−1 is the inverse function of F(x). In other words, G(p)=xp. A sketch of graph x=G(p) is given in Figure 1. A random sample X1, X2, …, Xn drawn from F(x) can be considered obtained in two steps. First draw a random sample q1, q2, …, qn from a standard uniform distribution U[0,1], and then transform them into X1, X2, …, Xn using the function Xi=G(qi). In Figure 1, the true VaR xp and the sample VaR, X(np), are plotted on the vertical axis. X(np) equals G(q(np)), where q(np) is the npth value when the qis are ordered from smallest to largest. The error |q(np)−p| on the horizontal axis is transformed to an error |X(np)−xp| on the vertical axis. The ratio |X(np)−xp|/|q(np)−p| is approximately the slope G′(p)=1/f(xp). This explains why the size of the error |X(np)−xp|, and thus the N in equation (2), is directly related to 1/f(xp).

To understand the meaning of factor p(1−p) in (2), we define K=number of qis in interval [0,p]. Then K is a binomial variable with E(K)=np and V(K)=np(1−p). For a fixed n, if p(1−p) is large then K has a large chance to deviate far from np. (p(1−p) is large when p is near ½ and is small when p is near 1.) A larger |K−np| implies that q(np) is a less precise estimate of p. Thus, the required sample size should increase with p(1