1. Introduction
The concept of excess loss is widely used and appreciated within the casualtyactuarial world, particularly in reinsurance, but also in retrospective and increasedlimits rating. This paper will reveal some of its hidden mysteries. In the next section we will define the excessloss function, and state its sometime advantages over standard probability concepts. This will prepare us, in the third section, to find new applications of the function, applications extending as far as moment generation. Excess losses naturally imply loss layers, whose probability distributions we will show in the fourth section to be more amenable to an excessloss treatment than to standard probability theory. Here also we will introduce an example involving the first two moments of a mixed exponential distribution. Then we will, in the fifth section, round out the second moment of the example by considering the covariances among loss layers, and will conclude in the sixth section.
2. The excessloss function
Let X be a nonnegative random variable, i.e., a random variable suitable for representing an amount of loss. Its cumulative distribution function F_{X}(a) = Prob[X≤a] has the following four properties:

If a b, then F_{X}(a) ≤ F_{X}(b) nondecreasing

lima→∞FX(a)=1 total probability

limb→a+FX(b)=FX(a) continuity from the right

lima→0−FX(a)=0 nonnegative
These properties allow for points of probability mass, since
Prob[X=a]=limb→a−Prob[b<X≤a]=FX(a)−limb→a−FX(b)≥0.
Of particular note, the probability for X to equal 0 may be positive. Moreover, let G_{X} be the survival function, the complement of F_{X}: G_{X}(a) = 1 − F_{X}(a) = Prob[X > a]. Hence, dG_{X}(a) = −dF_{X}(a). G_{X} is also continuous from the right, and Prob[X = a] = limb→a− G_{X}(b) − G_{X}(a). However, it is nonincreasing, its limit at infinity is zero, and G_{X}(a) = 1 for a 0.
For r ≥ 0, the expected portion of loss X in excess of “retention” r is defined as:
ExcessX(r)≡∫∞x=−∞max(0,x−r)dFX(x)=∫∞x=r(x−r)dFX(x)
In particular, Excess_{X}(0) = ∫∞x=0xdF_{X}(x) = E[X]. From
integration by parts, we reformulate:
ExcessX(r)=∫∞x=r(x−r)dFX(x)=−∫∞x=r(x−r)dGX(x)=−(x−r)GX(x)∞r+∫∞x=rGX(x)d(x−r)=(x−r)GX(x)r∞+∫∞x=rGX(x)dx=0−0+∫∞x=rGX(x)dx.
From their studies, casualty actuaries are familiar with this expression.
The excessloss function is especially useful in reinsurance: if 0 ≤ a ≤ b, the pure premium for the portion of loss X in the layer [a, b] equals Excess_{X}(a) − Excess_{X}(b). Note that since G is dimensionless, the unit of the excessloss function is the unit of dx, which is the unit of the loss amount and the retention. Two appealing properties of the excessloss function are (1) that it is everywhere continuous, even where the probability density is discrete, and (2) that if it is positive, it strictly decreases. Moreover, its derivative at r, if it exists, equals −G_{X}(r). Even if it does not exist, at least the left and right derivatives exist, and the difference of the left derivative from the right is the probability mass at r. It is helpful to extend the definition of the excessloss function to negative retentions, at which G_{X}(r) = 1. For r 0:
ExcessX(r)=∫∞x=rGX(x)dx=∫0x=rGX(x)dx+∫∞x=0GX(x)dx=∫0x=r1⋅dx+ ExcessX(0)=−r+E[X].
Often useful is the form, valid for all r, Excess_{X}(r) = −min(0, r) + x=max(0,r)G_{X}(x)dx.
It is not difficult to prove two basic theorems. If c > 0, then for all r:
ExcessX+c(r)= ExcessX(r−c) ExcesscX(r)=cxcessx(r/c).
The first holds true for c = 0; the second holds true as well in the limit as c → 0^{+}, from which it follows that Excess_{0}(r) = −min(0, r). Furthermore, Excess_{c}(r) = −min(0, r − c) = max(0, c − r).
As an example, let X be exponentially distributed with mean θ. Hence, Gx(x)={e−xθ0≤x1x≤0
So for r ≥ 0, ExcessX(r) = ∫∞x=rGX(x)dx = ∫∞x=re−xθdx=θe−xθr∞=θe−rθ and with negative retentions:
ExcessX(x)={θe−xθ0≤x−x+θx<0.
Figure 1 graphs this function over the domain [−θ, 4θ]. In preparation for the next section, we’ve also extended as a dotted line the negativeretention line, i.e., f(x) = −x + θ.
Figure 1.Excessloss function of an Exponential(θ) loss
The straight line itself is the graph of Excess_{θ}(x). It is indicative of positive variance that the excessloss function “pulls up and comes in for a landing” to the right of the dotted line. This is the clue for extracting more information from the excessloss function.
3. Higher moments and the excessloss function
The excessloss function ExcessX(r)=∫∞x=rGX(x)dx is not only elegant and useful, it is “fullinformational” in the sense that one can derive from it all the moments of X. Above, we saw that the mean of X equals Excess_{X}(0). That the dimension of the area under the excessloss function is the square of the loss unit suggests that the area has something to do with the second moment. The following derivation relies on the validity of inverting the order of integration over the region A, which is the part of the first quadrant of the Cartesian plane above the line y = x:
∫∞x=0 ExcessX(x)dx=∫∞x=0∫∞y=xGX(y)dy dx=∬
So the area under the excessloss function in the first quadrant equals half the second moment. Therefore, the area under the excessloss function but above the dotted line and the xaxis is
\begin{aligned}
\frac{1}{2} E\left[X^{2}\right]\frac{1}{2} E[X] E[X] & =\frac{1}{2}\left(E\left[X^{2}\right]E[X]^{2}\right) \\
& =\frac{1}{2} \operatorname{Var}[X] .
\end{aligned}
Hence, for the excessloss function to “land” to the right of E[X] indicates a nontrivial variance.
In general, for h(x) continuous on [0, ∞),
\small{
\begin{aligned}
\int_{x=0}^{\infty} \ Excess(x) d h(x) & =\int_{x=0}^{\infty} \int_{y=x}^{\infty} G_{X}(y) d y \ d h(x) \\
& =\int_{y=0}^{\infty} \int_{x=0}^{y} G_{x}(y) d h(x) d y \\
& =\int_{y=0}^{\infty} G_{x}(y)\{h(y)h(0)\} d y \\
& =\int_{y=0}^{\infty} G_{x}(y) h(y) d y\int_{y=0}^{\infty} G_{x}(y) h(0) d y \\
& =\int_{y=0}^{\infty} G_{x}(y) h(y) d yE[X] h(0) .
\end{aligned}
}
Instead of the double integration, we may use integration by parts:
\small{
\begin{aligned}
\int_{x=0}^{\infty} \ Excess_{X}(x) d h(x)= & \ Excess_{X}(x) h(x)_{0}^{\infty} \\
& \int_{x=0}^{\infty} h(x) d \ Excess_{X}(x) \\
= & 0\ Excess_{X}(0) h(0) \\
& \int_{x=0}^{\infty} h(x)\left(G_{X}(x) d x\right) \\
= & \int_{x=0}^{\infty} G_{X}(x) h(x) d xE[X] h(0) .
\end{aligned}
}
Now let H′(x) = h(x), or dH(x) = h(x)dx. The following derivation relies on a formula from Appendix A, viz., E[h(X)]=h(0)+\int_{x=0}^{\infty} G_{X}(x) d h(x)
\begin{aligned}
\int_{x=0}^{\infty} \ Excess_{X}(x) d h(x)= & \int_{x=0}^{\infty} G_{X}(x) h(x) d xE[X] h(0) \\
= & \int_{x=0}^{\infty} G_{X}(x) d H(x)E[X] H^{\prime}(0) \\
= & H(0)+\int_{x=0}^{\infty} G_{X}(x) d H(x) \\
& H(0)E[X] H^{\prime}(0) \\
= & E[H(X)]H(0)E[X] H^{\prime}(0) .
\end{aligned}
The invariance of this formula to the addition of a linear function to H confirms its correctness. For if H(x) ← H(x) + cx + d, then h(x) ← h(x) + c and
\begin{aligned}
\int_{x=0}^{\infty} \ Excess_{X}(x) d(h(x)+c)= & E[H(X)+c X+d] \\
& (H(0)+c \cdot 0+d) \\
& E[X]\left(H^{\prime}(0)+c\right) \\
= & E[H(X)]+c E[X] \\
& +dH(0)d \\
& E[X] H^{\prime}(0)c E[X] \\
= & \int_{x=0}^{\infty} E x c e s s_{X}(x) d h(x) .
\end{aligned}
So, for example, if h(x) = x^{k}, where k must be positive for continuity at zero, then H(x)=\frac{x^{k+1}}{k+1} and
\begin{aligned}
\int_{x=0}^{\infty} \ Excess_{X}(x) d x^{k}= & E\left[\frac{X^{k+1}}{k+1}\right]\frac{0^{k+1}}{k+1} \\
& E[X] \cdot 0^{k}=\frac{1}{k+1} E\left[X^{k+1}\right] .
\end{aligned}
So, the second and higher moments result from integrals of the excessloss function. Of course, the first moment is just Excess_{X}(0).
Especially interesting is the integral \int_{x=0}^{\infty} \ Excess_{X}(x)dte^{tx}, for which h(x) = te^{tx} and H(x) = e^{tx}:
\begin{aligned}
\int_{x=0}^{\infty} \ Excess_{X}(x) d t e^{x x} & =E\left[e^{x}\right]e^{t \cdot 0}E[X] \cdot t e^{t \cdot 0} \\
& =E\left[e^{t x}\right]1E[X] \cdot t \\
& =M_{X}(t)1M_{X}^{\prime}(0) \frac{t^{1}}{1 !} \\
& =\sum_{i=2}^{\infty} M_{X}^{[i]}(0) \frac{t^{i}}{i !}
\end{aligned}
This integral of the excessloss function reproduces the momentgenerating function of X, except for the removal of the first two terms of its Maclaurin series. The whole momentgenerating function of X is recovered as M_{x}(t)=1+\ Excess_{x}(0) \cdot t+\int_{x=0}^{\infty} \ Excess_{x}(x) d t e^{t x} . This implies that for t ≠ 0, \int_{x=0}^{\infty} \ Excess_{x}(x) d e^{x}=\sum_{i=2}^{\infty} M_{x}^{[i]}(0) \frac{t^{i1}}{i !}=\sum_{i=1}^{\infty} \frac{M_{x}^{[i+]}(0)}{i+1} \frac{t^{i}}{i !} \text {, } which holds true even for t = 0. Therefore, 1+\int_{x=0}^{\infty} \ Excess_{X}(x) d e^{t x}is the momentgenerating function of the random variable Y whose moments relate to those of X as E\left[Y^{i}\right]=\frac{E\left[X^{i+1}\right]}{i+1} To use a musical analogy, \int_{x=0}^{\infty} \ Excess_{X}(x) d e^{t x} is a transformation that “plays” X one octave lower.
4. Excessloss functions of layered losses
If X is a nonnegative random variable and 0 ≤ a ≤ b, then define the portion of loss X in the layer [a, b] as Layer(X; a, b) ≡ min(b − a, max(0, X − a)). The graph in Figure 2 shows that the layer function is flat except in the interval [a, b], in which Layer(x; a, b) = x − a.
Figure 2.Layer function
Our purpose in this section is to express Excess_{Y} in terms of Excess_{X}, which requires expressing G_{Y} in terms of G_{X}.
Of course, if r 0, then G_{Y}(r) = 1. For b − a ≤ r, G_{Y}(r) = Prob[Y > r] = Prob[Y > b − a] = 0. And in the middle, for 0 r b − a:
\begin{aligned}
G_{Y}(r) & =\ Prob[Y>r]=\ Prob[Xa>r] \\
& = \ Prob[X>r+a]=G_{X}(r+a) .
\end{aligned}
By continuity from the right, G_{Y}(0) = G_{Y}(0^{+}) = G_{X}(0^{+} + a) = G_{X}(a). So the final conditional formula is
{G}_{Y}(y)=\left\{\begin{array}{rc}
{1} & {y}<{0} \\
{G}_{x}(y+a) & {0} \leq {y}<{b}{a}. \\
{0} & {ba} \leq {y}
\end{array}\right.
Hence, for 0 ≤ r b − a:
\begin{aligned}
\ Excess_{Y}(r) & =\int_{y=r}^{\infty} G_{Y}(y) d y \\
& =\int_{y=r}^{ba} G_{Y}(y) d y \\
& =\int_{y=r}^{ba} G_{X}(y+a) d y \\
& =\int_{y+a=r+a}^{ba+a} G_{X}(y+a) d(y+a) \\
& =\int_{x=r+a}^{b} G_{X}(x) d x \\
& =\ Excess_{X}(r+a) \ Excess_{X}(b) .
\end{aligned}
The form Excess_{Y}(r) = Excess_{X}(min(b,r + a)) − Excess_{X}(b) is valid for all r ≥ 0; it also accommodates the three limiting cases a = 0, b = a, and b = ∞. Furthermore, E[Y] = Excess_{Y}(0) = Excess_{X}(a) − Excess_{X}(b), as mentioned in Section 2.
The tables provide an example. First, we mixed four exponential distributions (with weights w and means θ at the top of Table 1). The excessloss function of the mixed exponential distribution (the “MxdExp Excess” column) is \ Excess_{M X}(r)=\sum_{i=1}^{4} w_{i} \theta_{i} e^{\frac{r}{\theta_{i}}} Its values are shown for retentions from zero to 50 million (consider the unit of loss as USD) in steps of one million; the values are also equal to the values of the previous four columns (grayshaded), weighted according 0.500, 0.250, 0.125, and 0.125. The mean loss is 1,375,000. The final column of Table 1 shows the area under the Excess_{MX} curve from r to infinity. Its formula is
\small{
\begin{aligned}
\int_{x=r}^{\infty} \ Excess_{M X}(x) d x & =\int_{x=r}^{\infty} \sum_{i=1}^{4} w_{i} \theta_{i} e^{\frac{r}{\theta_{i}}} d x \\
& =\sum_{i=1}^{4} w_{i} \theta_{i} \int_{x=r}^{\infty} e^{\frac{r}{\theta_{i}}} d x=\sum_{i=1}^{4} w_{i} \theta_{i}^{2} e^{\frac{r}{\theta_{i}}} .
\end{aligned}
}
Table 1.Mixedexponential excess losses
Wgt (w) 
0.500 
0.250 
0.125 
0.125 
1.000 

Mean (θ) 
500,000 
1,000,000 
2,000,000 
5,000,000 
1,375,000 ± 2,471,715 
Retention (r) 
Exponential Excess Losses 
MedExp Excess 
\int_r^{\infty} Excess \ d x 
0 
500,000 
1,000,000 
2,000,000 
5,000,000 
1,375,000 
4.000E+12 
1,000,000 
67,668 
367,879 
1,213,061 
4,093,654 
789,143 
2.971E+12 
2,000,000 
9,158 
135,335 
735,759 
3,351,600 
549,333 
2.315E+12 
3,000,000 
1,239 
49,787 
446,260 
2,744,058 
411,856 
1.839E+12 
4,000,000 
168 
18,316 
270,671 
2,246,645 
319,327 
1.476E+12 
5,000,000 
23 
6,738 
164,170 
1,839,397 
252,142 
1.192E+12 
6,000,000 
3 
2,479 
99,574 
1,505,971 
201,314 
9.667E+11 
7,000,000 
0 
912 
60,395 
1,232,985 
161,901 
7.859E+11 
8,000,000 
0 
335 
36,631 
1,009,483 
130,848 
6.402E+11 
9,000,000 
0 
123 
22,218 
826,494 
106,120 
5.221E+11 
10,000,000 
0 
45 
13,476 
676,676 
86,280 
4.263E+11 
11,000,000 
0 
17 
8,174 
554,016 
70,278 
3.483E+11 
12,000,000 
0 
6 
4,958 
453,590 
57,320 
2.847E+11 
13,000,000 
0 
2 
3,007 
371,368 
46,797 
2.329E+11 
14,000,000 
0 
1 
1,824 
304,050 
38,234 
1.905E+11 
15,000,000 
0 
0 
1,106 
248,935 
31,255 
1.559E+11 
16,000,000 
0 
0 
671 
203,811 
25,560 
1.275E+11 
17,000,000 
0 
0 
407 
166,866 
20,909 
1.044E+11 
18,000,000 
0 
0 
247 
136,619 
17,108 
8.545E+10 
19,000,000 
0 
0 
150 
111,854 
14,000 
6.995E+10 
20,000,000 
0 
0 
91 
91,578 
11,459 
5.726E+10 
21,000,000 
0 
0 
55 
74,978 
9,379 
4.687E+10 
22,000,000 
0 
0 
33 
61,387 
7,678 
3.838E+10 
23,000,000 
0 
0 
20 
50,259 
6,285 
3.142E+10 
24,000,000 
0 
0 
12 
41,149 
5,145 
2.572E+10 
25,000,000 
0 
0 
7 
33,690 
4,212 
2.106E+10 
26,000,000 
0 
0 
5 
27,583 
3,448 
1.724E+10 
27,000,000 
0 
0 
3 
22,583 
2,823 
1.412E+10 
28,000,000 
0 
0 
2 
18,489 
2,311 
1.156E+10 
29,000,000 
0 
0 
1 
15,138 
1,892 
9.461E+09 
30,000,000 
0 
0 
1 
12,394 
1,549 
7.746E+09 
31,000,000 
0 
0 
0 
10,147 
1,268 
6.342E+09 
32,000,000 
0 
0 
0 
8,308 
1,039 
5.192E+09 
33,000,000 
0 
0 
0 
6,802 
850 
4.251E+09 
34,000,000 
0 
0 
0 
5,569 
696 
3.481E+09 
35,000,000 
0 
0 
0 
4,559 
570 
2.850E+09 
36,000,000 
0 
0 
0 
3,733 
467 
2.333E+09 
37,000,000 
0 
0 
0 
3,056 
382 
1.910E+09 
39,000,000 
0 
0 
0 
2,049 
256 
1.280E+09 
40,000,000 
0 
0 
0 
1,677 
210 
1.048E+09 
41,000,000 
0 
0 
0 
1,373 
172 
8.583E+08 
42,000,000 
0 
0 
0 
1,124 
141 
7.027E+08 
43,000,000 
0 
0 
0 
921 
115 
5.753E+08 
44,000,000 
0 
0 
0 
754 
94 
4.710E+08 
45,000,000 
0 
0 
0 
617 
77 
3.857E+08 
46,000,000 
0 
0 
0 
505 
63 
3.157E+08 
47,000,000 
0 
0 
0 
414 
52 
2.585E+08 
48,000,000 
0 
0 
0 
339 
42 
2.117E+08 
49,000,000 
0 
0 
0 
277 
35 
1.733E+08 
50,000,000 
0 
0 
0 
227 
28 
1.419E+08 
The total area, which according to Section 3 is E[MX^{2}]/2, is 4.000 × 10^{12} (USD squared). Therefore the variance of the mixed exponential loss is 2×4.000×10^{12} − 1,375,000^{2}, for a standard deviation of 2,471,715.
Table 2.Layered losses and moments
Retention 
Layers 
0 
5,000,000 
10,000,000 
20,000,000 
5,000,000 
10,000,000 
20,000,000 
\infty 
Excess Losses in Layer 
0 
1,122,858 
165,861 
74,822 
11,459 
1,000,000 
537,001 
115,034 
58,819 
9,379 
2,000,000 
297,191 
75,620 
45,861 
7,678 
3,000,000 
159,715 
44,568 
35,339 
6,285 
4,000,000 
67,185 
19,840 
26,776 
5,145 
5,000,000 
0 
0 
19,797 
4,212 
6,000,000 


14,102 
3,448 
7,000,000 


9,451 
2,823 
8,000,000 


5,650 
2,311 
9,000,000 


2,542 
1,892 
10,000,000 


0 
1,549 
11,000,000 



1,268 
12,000,000 



1,039 
13,000,000 



850 
14,000,000 



696 
15,000,000 



570 
16,000,000 



467 
17,000,000 



382 
18,000,000 



313 
19,000,000 



256 
20,000,000 



210 
21,000,000 



172 
22,000,000 



141 
23,000,000 



115 
24,000,000 



94 
25,000,000 



77 
26,000,000 



63 
27,000,000 



52 
28,000,000 



42 
29,000,000 



35 
30,000,000 



28 
E[Y] 
1,122,858 
165,861 
74,822 
11,459 
Area 
1.547E+12 
3.347E+11 
2.545E+11 
5.726E+10 
Var[Y] 
1.833E+12 
6.418E+11 
5.033E+11 
1.144E+11 
Std[Y] 
± 1,353,906 
± 801,119 
± 709,449 
± 338,211 
CV[Y] 
1.21 
4.83 
9.48 
29.52 
Table 2 partitions the support of MX into four nonoverlapping layers: [0, 5M], [5M, 10M], [10M, 20M], and [20M, ∞). Let Y_{i} denote the portion of loss MX in the i^{th} layer. Because of the nonoverlapping and complete partition, M X=\sum_{i} Y_{i} The excess losses are calculated according to the formula Excess_{Y}(r) = Excess_{X}(min(b,r+a)) − Excess_{X}(b). The formula for the infinite top layer is simpler: Excess_{Y}(r) = Excess_{X}(min(∞, r + 20M)) − Excess_{X}(∞) = Excess_{X}(r + 20M). As expected, the sum of the means of the layered losses, E\left[Y_i\right]= Excess _{Y_i}(0), equals 1,375,000; the partitioning conserves the first moment of the loss.
Table 2, like Table 1, derives the second moment of each layered loss from the area under its excessloss curve. But, as one of the advantages of the excessloss function, it is not necessary to do this anew; it is implicit in Table 1. For algebraically:
\begin{array}{l}
\frac{E\left(Y_{i}^{2}\right)}{2}=\int_{y=0}^{\infty} \ Excess_{Y_{i}}(y) d y \\
\quad = \int_{y=0}^{\infty}\left\{\ Excess_{M X}\left(\min \left(b_{i}, y+a_{i}\right)\right)\ Excess_{M X}\left(b_{i}\right)\right\} d y \\
\quad= \int_{y=0}^{b_{i}a_{i}}\left\{\ Excess_{M X}\left(y+a_{i}\right)\ Excess_{M X}\left(b_{i}\right)\right\} d y \\
\quad = \int_{y=0}^{b_{i}a_{i}} \ Excess_{M X}\left(y+a_{i}\right) d y\left(b_{i}a_{i}\right) \ Excess_{M X}\left(b_{i}\right) \\
\quad = \int_{x=a_{i}}^{b_{i}} \ Excess_{M X}(x) d x\left(b_{i}a_{i}\right) \ Excess_{M X}\left(b_{i}\right) \\
\quad = \int_{x=a_{i}}^{\infty} \ Excess_{M X}(x) d x\int_{x=b_{i}}^{\infty} \ Excess_{M X}(x) d x \\
\qquad\left(b_{i}a_{i}\right) \ Excess_{M X}\left(b_{i}\right) .
\end{array}
The values of the last two integrals are those of the last column of Table 1 at retentions a_{i} and b_{i}. The “Area” row at the bottom of Table 2 contains these E[Y^{2}_{i}]/2 values, from which follow the variances and standard deviations. It is well known among reinsurance actuaries that the coefficients of variation, CV = Std/E, increase as the layers ascend. The sum of the four areas, 2.193 × 10^{12}, does not equal the 4.000 × 10^{12} of the MX area; nor is the sum of the four variances, 3.093 × 10^{12}, equal to the variance of MX, 6.109 × 10^{12}. What is lacking in the conservation of the second moment is the covariance among the layered losses, to which we now turn.
5. Covariances among nonoverlapping layers
Since Cov[Y_{i}, Y_{j}] = E[Y_{i}Y_{j}] − E[Y_{i}]E[Y_{i}], and the means are known, we need a formula for the product moments E[Y_{i}Y_{j}], where i ≠ j. The actual formula, based on the loss variable of which they are layers, is E\left[Y_{i} Y_{j}\right]=\int_{x=\infty}^{\infty}y_{i}(x)y_{j}(x)dF(x). A formal derivation is not necessary; the following argument will suffice. Since the layers are different but nonoverlapping, one is above the other. The range of the integration may be restricted to the range over which the integrand y_{i}(x)y_{j}(x) is nonzero, which range is the intersection of the nonzero ranges of y_{i}(x) and y_{j}(x) separately. However, due to the nonoverlapping layering, the range of the higher layer must be a subset of that of the lower. Therefore, the range of integration may be restricted to the range over which the higher layer is nonzero. But over this range the lower layer is exhausted, or equal to its width. Hence, the product moment of two different layers equals the product of the width of the lower layer and the mean of the higher.
The 4×4 blueshaded block at the bottom of Table 3 contains the product moments. Down its diagonal are E[Y^{2}_{i}], or twice the values of the “Area” row of Table 2. Off the diagonal are the lowerwidthandhighermean products. The augmenting margin (unshaded) pertains to the loss from ground up, or MX; it contains row and column sums of the blueshaded block, since
\small{
\begin{aligned}
E\left[Y_{i} M X\right] & =E\left[Y_{i}\left(\sum_{j=1}^{4} Y_{j}\right)\right]=E\left[\sum_{j=1}^{4} Y_{i} Y_{j}\right]=\sum_{j=1}^{4} E\left[Y_{i} Y_{j}\right] \\
E\left[M X^{2}\right] & =E\left[\left(\sum_{i=1}^{4} Y_{i}\right) M X\right]=E\left[\left(\sum_{i=1}^{4} Y_{i} M X\right)\right] \\
& =\sum_{i=1}^{4} E\left[Y_{i} M X\right]=\sum_{i=1}^{4} \sum_{j=1}^{4} E\left[Y_{i} Y_{j}\right] .
\end{aligned}
}
Table 3.Twomoment summary



Ground Up 
5M xs 0 
5M xs 5M 
10M xs 10M 
∞ xs 20M 
Loss Layer 
Mean 
Std 
Variance 
Ground Up 
1,375,000 
± 2,471,715 
6.109E+12 
2.811E+12 
1.702E+12 
1.269E+12 
3.27 9E+11 
5M xs 0 
1,122,858 
± 1,353,906 
2.811E+12 
1.833E+12 
6.431E+11 
2.901E+11 
4.443E+10 
5M xs 5M 
165,861 
± 801,119 
1.702E+12 
6.431E+11 
6.418E+11 
3.617E+11 
5.539E+10 
10M xs 10M 
74,822 
± 709,449 
1.269E+12 
2.901E+11 
3.617E+11 
5.033E+11 
1.137E+11 
∞ xs 20M 
11,459 
± 338,211 
3.279E+11 
4.443E+10 
5.539E+10 
1.137E+11 
1.144E+11 

Correlation 



100% 
84% 
86% 
72% 
39% 



84% 
100% 
59% 
30% 
10% 



86% 
59% 
100% 
64% 
20% 



72% 
30% 
64% 
100% 
47% 



39% 
10% 
20% 
47% 
100% 

E[Z] E[Z]′ 



1.89E+12 
1.54E+12 
2.28E+11 
1.03E+11 
1.58E+10 



1.54E+12 
1.26E+12 
1.86E+11 
8.40E+10 
1.29E+10 



2.28E+11 
1.86E+11 
2.75E+10 
1.24E+10 
1.90E+09 



1.03E+11 
8.40E+10 
1.24E+10 
5.60E+09 
8.57E+08 



1.58E+10 
1.29E+10 
1.90E+09 
8.57E+08 
1.31E+08 

E[ZZ′] 



8.000E+12 
4.355E+12 
1.930E+12 
1.372E+12 
3.437E+11 



4.355E+12 
3.094E+12 
8.293E+11 
3.741E+11 
5.729E+10 



1.930E+12 
8.293E+11 
6.693E+11 
3.741E+11 
5.729E+10 



1.372E+12 
3.741E+11 
3.741E+11 
5.089E+11 
1.146E+11 



3.437E+11 
5.729E+10 
5.729E+10 
1.146E+11 
1.145E+11 
The soundness of our logic is confirmed inasmuch as E⎣MX^{2}⎦ = 8.000 × 10^{12}, which is twice the area under the Excess_{MX} curve, according to Table 1.
Table 3 gives the label ‘Z’ to the 5 × 1 vector whose first element is MX and remaining four are the Y_is. So the augmented productmoment matrix is E[ZZ′], as labeled. The box above it is the outer product of the means, E[Z]E[Z]′. The vector variance, Var[Z] = E[ZZ′] − E[Z]E[Z]′, is shown under the heading “Variance.” The “Std” column contains the square roots of the diagonal elements of the variance matrix, which values agree with those of Tables 1 and 2. When covariance is taken into account, the second moment of the loss is conserved. Finally, removing the standarddeviation scale from the variance matrix results in the “Correlation” matrix. It bears out something else well known to reinsurance actuaries, namely, that layered losses are positively correlated, although the correlation diminishes as the distance between the layers increases.
6. Conclusion
The mathematics of excess losses is not only beautiful; it is powerful. The excessloss function impounds all the information of the probability distribution of its loss. Therefore, although from the beginning actuaries and underwriters have found it convenient for the calculation of the pure premiums of layered losses, it is just as serviceable for the calculation of higher moments, whether the integrals involved be calculated analytically (as done in our example) or approximated numerically. The versatile mixed exponential distribution lessens the difficulty of such calculations.
Appendix
Appendix A
Stieltjes integrals: Watch your step!
The expectation of h(X) is \sum_{i=1}^{\infty}h(x_{i}) ⋅ Prob[X = x_{i}], if X is discreet, and \prod_{x=\infty}^{\infty}h(x)f_{X}(x)dx, if X is continuous. But random variables may be mixed, i.e., continuous with discrete steps. For the sake of generality we can employ the Stieltjes integral [1, p. 12]: E[h(X)] = \prod_{x=\infty}^{\infty}h(x)dF_{X}(x), where F_{X}(x) is the cumulative distribution function of X. Of course, if F is differentiable, dF_{X}(x) = f_{X}(x)dx, and the Stieltjes integral reverts back to the familiar (CauchyRiemann) integral \prod_{x=\infty}^{\infty}h(x)f_{X}(x)dx. The Stieltjes integral is defined as
\int_{x=a}^{b} h(x) d F(x)=\lim _{\max \left\langle\Delta x_{i}\right\rangle \rightarrow 0} \sum_{i=1}^{n} h\left(\xi_{i}\right) \Delta F_{i}
where the interval is partitioned as a = x_{0} x_{1} x_{n} = b, Δx_{i} = x_{i} − x_{i1}, and ΔF_{i} = F(x_{i}) − (F(x_{i1})). Each \xi_{i} is arbitrarily chosen from the subinterval x_{i}−1 ≤ ξ_{i} ≤ x_{i}.
If u is continuous over the interval, nothing is problematic about this definition. Now for our purposes X is a nonnegative random variable; hence, for x 0, dF_{X}(x) = 0, and E[h(X)] = \lim _{\varepsilon \rightarrow 0^{}} \int_{x=\varepsilon}^{\infty}h(x)dF_{X}(x). One is tempted to simplify this to \int_{x=0}^{\infty}h(x)dF_{X}(x) would miss any discrete step at zero, since the cumulative distribution function is continuous from the right. Therefore, the Stieltjes integral \int_{x=a}^{b}h(x)dF_{x}(x) counts probability mass at the upper limit, but not at the lower. This asymmetry ensures that \int_{x=a}^{b}h(x)dF_{x}(x) + \int_{x=b}^{c}h(x)dF_{x}(x) = \int_{x=a}^{c}h(x)dF_{x}(x); otherwise, probability mass at the endpoints might be either ignored or doublecounted.
Therefore, the correct formulation for a nonnegative random variable X is E[h(X)] = h(0) Prob[X = 0] + \int_{x=0}^{\infty}h(x)dF_{X}(x). From integration by parts, we derive the form for the survival function G_{X}(x) = 1 − F_{X}(x), which also is continuous from the right:
\begin{aligned}
E[h(X)]= & h(0) \ Prob[X=0]+\int_{x=0}^{\infty} h(x) d F_{x}(x) \\
= & h(0) \ Prob[X=0]+\int_{x=\infty}^{\infty} h(x) d G_{x}(x) \\
= & h(0) \ Prob[X=0]+\left.h(x) d G_{x}(x)\right_{\infty} ^{0} \\
& \int_{x=\infty}^{\infty} G_{x}(x) d h(x) \\
= & h(0) \ Prob[X=0]+h(0) G_{x}(0)0 \\
& +\int_{x=0}^{\infty} G_{x}(x) d h(x) .
\end{aligned}
\begin{array}{l}
=h(0) \ Prob[X \geq 0]+\int_{x=0}^{\infty} G_{X}(x) d h(x) \\
=h(0)+\int_{x=0}^{\infty} G_{X}(x) d h(x) .
\end{array}
For this reason we subtitled this appendix “Watch your step!” For the [0, ∞] Stieltjes integrals in this paper do not count probability mass at zero. For two reasons it is easy to overlook this subtlety. First, if Prob[X = 0] = 0, E[h(X)] = \int_{x=0}^{\infty}h(x)dF_{X}(x), although the other form is still E[h(X)] = h(0) + \int_{x=0}^{\infty}G_{X}(x)dh(x). And second, it is common for h(x) to be a positive power of x, in which case h(0) = 0.
Watching one’s step at zero also consistently handles a constant shift in h(x):
\begin{aligned}
E[h(X)+c]= & E[h(X)]+c \\
= & h(0) \ Prob[X=0]+\int_{x=0}^{\infty} h(x) d F_{X}(x)+c \\
= & h(0) \ Prob[X=0]+\int_{x=0}^{\infty} h(x) d F_{X}(x) \\
& +c\left\{Prob[X=0]+\int_{x=0}^{\infty} d F_{X}(x)\right\} \\
= & (h(0)+c) \ Prob[X=0] \\
& +\int_{x=0}^{\infty}(h(x)+c) d F_{X}(x) .
\end{aligned}
And
\begin{aligned}
E[h(X)+c] & =E[h(X)]+c \\
& =h(0)+\int_{x=0}^{\infty} G_{X}(x) d h(x)+c \\
& =(h(0)+c)+\int_{x=0}^{\infty} G_{X}(x) d(h(x)+c) .
\end{aligned}
Appendix B
Two theorems about reinsurance layers
Here we will give proofs of the two facts which this paper claims to be “well known to reinsurance actuaries”:

that the coefficients of variation, CV = Std/E, increase as the layers ascend, and

that layered losses are positively correlated, although the correlation diminishes as the distance between the layers increases.
Our proofs will begin with “differential” layers, i.e., to layers whose width is dx. But, as we shall show, one can integrate such layers into layers of any width.
Let X be a nonnegative random variable, whose survival function (the complement of the cumulative distribution function) is G_{X}. The probability that X x is G_{X}(x); therefore, the probability of a nonzero loss in the interval [x, x + Δx] is G_{X}(x). In the limit, as Δx → 0^{+}, the expected loss in the layer, E[Layer(X; x, x + Δx)], approaches G_{X}(x)Δx. Defining dY(x) as the portion of X in the differential layer [x, x + dx], we may say that dY(x) ∼ Bernoulli(G_{X}(x)) ⋅ dx. Accordingly, E[dY(x)] = G_{X}(x)dx and E⎣(dY(x))^{2}⎦ = G_{X}(x)(dx)^{2}. Arguing as we did in Section 5, we have E[dY(x_{1})dY(x_{2})] = min(G_{X}(x_{1}), G_{X}(x_{2}))dx_{1}dx_{2}, of which E[(dY(x))^{2}] = G_{X}(x)(dx)^{2} is a special instance in which x_{1} = x_{2} = x.^{12}
Before we prove the two theorems, it will be instructive to see how a layer can be integrated from differential layers. If Y is the portion of X in layer [a,b], then Y = Layer(X;a,b) = \int_{x=a}^{b}dY(x). Hence,
E[Y]=E\left[\int_{x=a}^{b} d Y(x)\right]=\int_{x=a}^{b} E[d Y(x)]=\int_{x=a}^{b} G_{X}(x) d x
Moreover, the second moment is
\begin{array}{l}
E\left[Y^{2}\right]=E\left[\left(\int_{x=a}^{b} d Y(x)\right)^{2}\right] \\
=E\left[\int_{x_{1}=a}^{b} d Y\left(x_{1}\right) \int_{x_{2}=a}^{b} d Y\left(x_{2}\right)\right] \\
=E\left[\int_{x_{1}=a}^{b} \int_{x_{2}=a}^{b} d Y\left(x_{2}\right) d Y\left(x_{1}\right)\right] \\
=\int_{x_{1}=a}^{b} \int_{x_{2}=a}^{b} E\left[d Y\left(x_{2}\right) d Y\left(x_{1}\right)\right] \\
=\int_{x_{1}=a}^{b} \int_{x_{2}=a}^{b} \min \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right) d x_{2} d x_{1} \\
=\int_{x_{1}=a}^{b}\left\{\int_{x_{2}=a}^{x_{1}} \min \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right) d x_{2}\right. \\
\left.+\int_{x_{2}=x_{1}}^{b} \min \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right) d x_{2}\right\} d x_{1} \\
=\int_{x_{1}=a}^{b}\left\{\int_{x_{2}=a}^{x_{1}} G_{X}\left(x_{1}\right) d x_{2}+\int_{x_{2}=x_{1}}^{b} G_{X}\left(x_{2}\right) d x_{2}\right\} d x_{1} \\
=\int_{x_{1}=a}^{b} \int_{x_{2}=a}^{x_{1}} G_{X}\left(x_{1}\right) d x_{2} d x_{1}+\int_{x_{1}=a}^{b} \int_{x_{2}=x_{1}}^{b} G_{X}\left(x_{2}\right) d x_{2} d x_{1} \\
=\int_{x_{1}=a}^{b} \int_{x_{2}=a}^{x_{1}} G_{X}\left(x_{1}\right) d x_{2} d x_{1}+\int_{x_{2}=a}^{b} \int_{x_{1}=a}^{x_{2}} G_{X}\left(x_{2}\right) d x_{1} d x_{2} \\
=2 \int_{x_{1}=a}^{b} \int_{x_{2}=a}^{x_{1}} G_{X}\left(x_{1}\right) d x_{2} d x_{1} . \\
\end{array}
\begin{array}{l}
=2 \int_{x=a}^{b} G_{X}(x)(xa) d x \\
=\int_{x=a}^{b} G_{X}(x) d(xa)^{2} .
\end{array}
For ease of understanding, the derivation proceeded in many small steps; nevertheless, line seven deserves an explanation. Since G_{X} is nonincreasing, min(G_{X}(x_{1}), G_{X}(x_{2})) = G_{X}(max(x_{1}, x_{2})). So by dividing the inner integral into the two regions, we can identify the minimum.^{13} The reproduction of the moments of Y confirms the legitimacy of the formula Y=\int_{x=a}^{b} d Y(x)
One more notion is required for our proofs, a notion which we will call the coefficient of covariance. It is the covariance between two random variables whose means have been normalized to unity, i.e.,
\begin{aligned}
\operatorname{CoefCov}[X, Y] & =\operatorname{Cov}\left[\frac{X}{E[X]}, \frac{Y}{E[Y]}\right] \\
& =\frac{\operatorname{Cov}[X, Y]}{E[X] E[Y]}=\frac{E[X Y]E[X] E[Y]}{E[X] E[Y]} \\
& =\frac{E[X Y]}{E[X] E[Y]}1 .
\end{aligned}
Of course, CoefCov[X, X] = CV^{2}[X]. The coefficient of covariance between two differential layers is
\begin{array}{l}
\operatorname{CoefCov}\left[d Y\left(x_{1}\right), d Y\left(x_{2}\right)\right]=\frac{E\left[d Y\left(x_{1}\right) d Y\left(x_{2}\right)\right]}{E\left[d Y\left(x_{1}\right)\right] E\left[d Y\left(x_{2}\right)\right]}1 \\
\quad =\frac{\min \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right) d x_{1} d x_{2}}{G_{X}\left(x_{1}\right) d x_{1} G_{X}\left(x_{2}\right) d x_{2}}1 \\
\quad =\frac{\min \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right)}{G_{X}\left(x_{1}\right) G_{X}\left(x_{2}\right)}1 \\
\quad =\frac{\min \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right)}{\min \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right) \cdot \max \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right)}1 \\
\quad =\frac{1}{\max \left(G_{X}\left(x_{1}\right), G_{X}\left(x_{2}\right)\right)}1 \\
\quad =\frac{1}{G_{X}\left(\min \left(x_{1}, x_{2}\right)\right)}1 .
\end{array}
This coefficient is well defined when G_{X}(x_{1}) and G_{X}(x_{2}) are nonzero; loss in the differential layers must be possible. Furthermore, CV^{2}[dY(x)] = CoefCov[dY(x), dY(x)] =\frac{1}{G_{X}(x)}1. Due to the properties of G_{X}, CV^{2}[dY(x)] is a nondecreasing function in x.
With this preparation, the first fact is easily proven. If Y = Layer(X; a,b) = \int_{x=a}^{b}dY(x) and G_{X}(a) ≥ G_{X}(b) 0, then
\begin{array}{l}
C V^{2}[Y]=\operatorname{Var}[Y] / E[Y]^{2} \\
=\frac{\int_{x_{1}=a} \int_{x_{2}=a}^{b} \operatorname{Cov}\left[d Y\left(x_{1}\right), d Y\left(x_{2}\right)\right]}{\left(\int_{x=a}^{b} E[d Y(x)]\right)^{2}} \\
=\frac{\int_{x_{1}=a x_{2}=a}^{b} E\left[d Y\left(x_{1}\right)\right] E\left[d Y\left(x_{2}\right)\right] \operatorname{CoefCov}\left[d Y\left(x_{1}\right), d Y\left(x_{2}\right)\right]}{\int_{x_{1}=a}^{b} \int_{x_{2}=a}^{b} E\left[d Y\left(x_{1}\right)\right] E\left[d Y\left(x_{2}\right)\right]} .
\end{array}
Hence, CV^{2}[Y] is a weighted average (weighted over two dimensions) of the coefficients of the layer’s covariances. And since the weights are nonnegative, the weighted average must be bounded by the minimum and maximum coefficients, which are at the endpoints:
\begin{aligned}
C V^{2}[d Y(a)]= & \ CoefCov[d Y(a), d Y(a)] \leq C V^{2}[Y] \\
& \leq \ CoefCov[d Y(b), d Y(b)] \\
& =C V^{2}[d Y(b)] .
\end{aligned}
Therefore, of two layers, [a, b] and [c, d], where a b ≤ c d, the CV^{2} of the lower will be less than or equal to that of the higher. And if probability is consumed anywhere in these layers,^{14} the inequality will be strict. Even if the two layers overlap (i.e., a ≤ c b and b ≤ d, but not both a = c and b = d), one can consider three intervals, the middle interval being the overlap. Then, as above, CV^{2}(A) ≤ CV^{2}(B) ≤ CV^{2}(C). Because the unions involve weightedaveraging, CV^{2}(A) ≤ CV^{2} (A ∪ B) ≤ CV^{2} (B) ≤ CV^{2} (B ∪ C) ≤ CV^{2}(C). Therefore, the CV^{2} of a higher layer is greater than or equal to that of a lower layer, even if there is some overlap; the inequality is strict, if probability is consumed. Finally, since CV ≥ 0, the inequalities are as valid for CV as for CV^{2}. Note that the widths of the layers do not need to be equal.
Second, as to correlation, let Y_{1} = Layer(X;a,b) = \int_{x=a}^{b}dY(x) and Y_{2} = Layer(X;c,d) = \int_{x=c}^{d}dY(x), for a b ≤ c d. Therefore, we know that the minimum G_{X} will be in the higher interval [c, d].
Under these conditions,
\small{
\begin{aligned}
& \operatorname{Cov}\left[Y_1, Y_2\right]=E\left[Y_1 Y_2\right]E\left[Y_1\right] E\left[Y_2\right] \\
& \quad =\int_{x_1=a}^b \int_{x_2=c}^d E\left[d Y\left(x_1\right) d Y\left(x_2\right)\right]E\left[Y_1\right] E\left[Y_2\right] \\
& \quad =\int_{x_1=a}^b \int_{x_2=c}^d \min \left(G_X\left(x_1\right), G_X\left(x_2\right)\right) d x_2 d x_1E\left[Y_1\right] E\left[Y_2\right] \\
& \quad =\int_{x_1=a}^b \int_{x_2=c}^d G_X\left(x_2\right) d x_2 d x_1E\left[Y_1\right] E\left[Y_2\right] . \\
& \quad \quad =(ba) \int_{x_2=c}^d G_X\left(x_2\right) d x_2E\left[Y_1\right] E\left[Y_2\right] \\
& \quad \quad =(ba) E\left[Y_2\right]E\left[Y_1\right] E\left[Y_2\right] \\
& \quad \quad =\left(baE\left[Y_1\right]\right) E\left[Y_2\right] .
\end{aligned}
}
Because E[Y_{1}] = \int_{x=a}^{b}G_{X}(x)dx ≤ \int_{x=a}^{b}_{1}⋅ dx = b − a, b − a − E[Y_{1}] ≥ 0; hence, Cov[Y_{1}, Y_{2}] ≥ 0. So the correlation coefficient between the portions of X in the two layers is
\begin{aligned}
\operatorname{Corr}\left[Y_{1}, Y_{2}\right] & =\frac{\operatorname{Cov}\left[Y_{1}, Y_{2}\right]}{\sigma_{Y_{1}} \sigma_{Y_{2}}} \\
& =\frac{baE\left[Y_{1}\right]}{\sigma_{Y_{1}}} \cdot \frac{E\left[Y_{2}\right]}{\sigma_{Y_{2}}} \\
& =\frac{\left(\frac{baE\left[Y_{1}\right]}{\sigma_{Y_{1}}}\right)}{C V\left[Y_{2}\right]} .
\end{aligned}
Now consider shifting [c, d] to the right, i.e., to [c + ξ, d + ξ], where ξ ≥ 0. And let Y_{2}(ξ) = Layer(X; c + ξ, d + ξ) = \int_{x=c+\xi}^{d+\xi}dY(x). From the first proof we know that CV[Y_{2}(ξ)] is nondecreasing. Since the numerator of Corr is constant, Corr[Y_{1}, Y_{2}(ξ)] is nonincreasing; strictly decreasing if probability is consumed. Therefore, as the retention of the upper layer so moves away from that of the lower as to consume probability, the correlation decreases. This implies that in the absence of compensating risk premiums, a reinsurer should not underwrite neighboring layers.