1. Introduction
After the famous chain ladder method and the BornhuetterFerguson method (Bornhuetter and Ferguson 1972), the Cape Cod reserving method (hereinafter “CC method”; Bühlmann and Straub 1983) is one of the methods most used by practicing actuaries for the projection of nonlife paid or incurred triangles. The CC method relies on very few parameters and is hence favored by practicing actuaries.
Following the development of the prediction error estimate for the chain ladder method (Mack 1993, 1999) and for the BornhuetterFerguson method (Mack 2008), Saluz (2015) proposed the estimate of the prediction error for the CC method. The estimate of the prediction error for the CC method by Saluz (2015) should be put in the context of many attempts to better understand the nonlife reserving distributions. For example, in relation to the CC method, Clark (2008) already proposed a reserving system that allows the actuary to use exposure information, such as onlevel premium, even if that information is only available for a limited number of years. Such a reserving system is related to CC and BornhuetterFerguson methods. Among the different attempts to provide characteristics of the reserving distributions, three areas of research can be observed:

The Bayesian models: Based on external information, these models attempt to provide information on the final reserve amount and its distribution (Taylor 2015; England, Verrall, and Wüthrich 2012). These models often relate to the Bornhuetter Ferguson model.

The stochastic models: These models rely on an underlying simulation framework in which parameters are fitted to the existing data. The most common models are the bootstrapping model (England and Verrall 2002), the GLM models (Merz and Wuthrich 2008b) or making assumptions on chain ladder coefficients (Barnett and Zehnwirth 2000).

In recent years, the necessity to understand the claim development result for the first year within the Solvency 2 framework has led to major breakthroughs around this topic (Merz and Wuthrich 2008a; Siegenthaler 2017). By extension, the claim development result for the full runoff in the chain ladder framework was also developed (Merz and Wuthrich 2014).
In all three areas mentioned above, the focus has always been on the expected value of the reserve amount and its standard deviation. Very few papers have focused on the third or fourth moment of the reserve distribution. Such papers include the skewness estimates for the chain ladder method (Salzmann, Wuthrich, and Merz 2012; Dal Moro 2013) and for the BornhuetterFerguson method (Dal Moro 2021).
In order to complete our knowledge of the CC method in a distributionfree framework, a missing piece is the skewness of the CC method. Based on the work of Saluz (2015), this paper will propose a first approach to estimate this skewness.
2. Notation and data structure
We denote the cumulative claims (cumulative payments or incurred losses) in accident year
at the end of development year by and we assume J ≤ I. Let denote the incremental claims, where we set The summation over an index starting from 0 is denoted with a square bracket, for example:\[C_{\left\lfloor k \right\rfloor,j} = \sum_{i = 0}^{k}C_{i,j},\ 0 \leq k \leq I,\ 0 \leq j \leq J.\]
We assume that all claims are settled after development year J and therefore the total ultimate claim of accident year i is given by
At time I, we have information in the upper left trapezoid/triangle\[D_{I} = \left\{ C_{i,j}:i + j \leq I,\ j \leq J \right\}\]
and our goal is to predict the lower right triangle
\[D_{I}^{c} = \left\{ C_{i,j}:i + j > I,\ i \leq I,j \leq J \right\}.\]
The chain ladder prediction of the ultimate claim
of accident year i > I − J is given by\[{\widehat{C}}_{i,J}^{CL} = C_{i,\iota(i)}\prod_{j = \iota(i)}^{J  1}{\widehat{f}}_{j}\]
where
\[{\widehat{f}}_{j} = \frac{C_{\left\lfloor I  j  1 \right\rfloor,j + 1}}{C_{\left\lfloor I  j  1 \right\rfloor,j}}\\ \text{and} \ \iota(i) = min(J,I  i).\]
The chain ladder development pattern is defined as
\[{\widehat{\beta}}_{j}^{CL} = \prod_{k = j}^{J  1}{\widehat{f}}_{k}^{ 1},0 \leq j \leq J  1,\ {\widehat{\beta}}_{J}^{CL} = 1\ . \tag{1}\]
3. The Cape Cod method
The Cape Cod predictor (Bühlmann and Straub 1983) for the ultimate claim is given by
\[{\widehat{C}}_{i,J}^{CC} = C_{i,\iota(i)} + \upsilon_{i}\widehat{q}\left( 1  {\widehat{\beta}}_{\iota(i)} \right)\]
where the earned premium for accident year i is denoted by
and\[\widehat{q} = \frac{\sum_{i = 0}^{I}C_{i,\iota(i)}}{\sum_{i = 0}^{I}{\upsilon_{i}{\widehat{\beta}}_{\iota(i)}}}\ \ .\]
is an estimate of and describes the percentage of claims emerging up to development year The incremental development pattern is estimated by
\[{\widehat{\gamma}}_{0} = {\widehat{\beta}}_{0}\\{\widehat{\gamma}}_{j + 1} = {\widehat{\beta}}_{j + 1}  {\widehat{\beta}}_{j},\ \ \ 0 \leq j \leq J  1.\]
In BühlmannStraub (1983), it is mentioned that the estimation of the development pattern is an unsolved problem. In practice, the development pattern is often estimated by the chain ladder (CL) development pattern given in (1).
Finally, we define the outstanding loss liabilities for accident year i at time I as
\[R_{i}^{CC} = C_{i,J}  C_{i,I  i}\]
4. The stochastic model underlying the Cape Cod method
Model assumptions
Incremental claims
are independent and there exist positive parameters and a development pattern with such that\[E\left\lbrack X_{i,j} \right\rbrack = \upsilon_{i}q\gamma_{j}\] \[SK\left\lbrack X_{i,j} \right\rbrack = \left( \upsilon_{i}q \right)^{\frac{3}{2}}\ t_{j}^{3}\]
where
denotes the third moment of the random variableFor the estimation of the skewness, we need estimates for
Note that\[\widehat{q^{\frac{3}{2}}\ t_{j}^{3}} = \frac{1}{I  j}\sum_{i = 0}^{I  j}\frac{1}{\upsilon_{i}^{\frac{3}{2}}}\left( X_{i,j}  \upsilon_{i}\widehat{\gamma_{j}} \right)^{3},\ 0 \leq j \leq J,\ j \neq I\]
is an unbiased estimator for
Note also that the above model assumptions assume that the expected loss ratio q is the same for all accident years.
Parameter estimation
In Saluz (2015), the normalized development pattern is replaced by the raw development pattern below:
\[{\widehat{\gamma}}_{j}^{raw} = \frac{X_{\left\lfloor I  j \right\rfloor,j}}{\upsilon_{\left\lfloor I  j \right\rfloor}} \tag{2}\]
And the cumulative development pattern is estimated by
\[{\widehat{\beta}}_{j}^{raw} = \sum_{k = 0}^{j}{\widehat{\gamma}}_{j}^{raw}\]
5. Skewness of the CC method per accident year
The mean skewness of the estimate of the ultimate loss
is defined as\[ \begin{align} {SK}_{C_{i,J}D_{I}}\left( {\widehat{C}}_{i,J}^{CC} \right) &= E\left\lbrack \left( C_{i,J}  {\widehat{C}}_{i,J}^{CC} \right)^{3}D_{I} \right\rbrack \\ &= E\biggl\lbrack \bigl( \sum_{j = I  i + 1}^{J}X_{i,j}  \upsilon_{i}\bigl( {\widehat{\beta}}_{J}^{raw} \\ &\quad \quad {\widehat{\beta}}_{I  i}^{raw} \bigr) \bigr)^{3} D_{I} \biggr\rbrack \end{align}\]
\[ \begin{align} {SK}_{C_{i,J}D_{I}}\biggl( {\widehat{C}}_{i,J}^{CC} \biggr) &= E\biggl\lbrack \biggl( \sum_{j = I  i + 1}^{J}X_{i,j}  \upsilon_{i}q\bigl( 1  \beta_{I  i} \bigr) \\ &\quad \quad + \upsilon_{i}q\bigl( 1  \beta_{I  i} \bigr)  \upsilon_{i}\bigl( {\widehat{\beta}}_{J}^{raw} \\ &\quad \quad  {\widehat{\beta}}_{I  i}^{raw} \bigr) \biggr)^{3} D_{I} \biggr\rbrack \end{align} \]
As we have
\[\sum_{j = I  i + 1}^{J}\gamma_{j}^{raw} = q\left( 1  \beta_{I  i} \right)\]
we get
\[ \begin{align} {SK}_{C_{i,J}D_{I}}\left( {\widehat{C}}_{i,J}^{CC} \right) &= E\biggl\lbrack \bigl( \sum_{j = I  i + 1}^{J}\bigl( X_{i,j}  \upsilon_{i}\gamma_{j}^{raw} \bigr) \\ &\quad \quad  \upsilon_{i}\bigl( {\widehat{\beta}}_{J}^{raw}  {\widehat{\beta}}_{I  i}^{raw} \\ &\quad \quad  q\bigl( 1  \beta_{I  i} \bigr) \bigr) \bigr)^{3} D_{I} \biggr\rbrack \end{align}\]
Due to the independence of the
we have\[ \begin{align} {SK}_{C_{i,J}D_{I}}\left( {\widehat{C}}_{i,J}^{CC} \right) &= E\biggl\lbrack \biggl( \sum_{j = I  i + 1}^{J}\bigl( X_{i,j}  \upsilon_{i}\gamma_{j}^{raw} \bigr) \biggr)^{3}D_{I} \biggr\rbrack \\ &\quad  \nu_{i}^{3}\ \\ &\quad E\biggl\lbrack \biggl( \bigl( {\widehat{\beta}}_{J}^{raw}  {\widehat{\beta}}_{I  i}^{raw}  q\bigl( 1  \beta_{I  i} \bigr) \bigr) \biggr)^{3}\\ &\quad \quad D_{I} \biggr\rbrack \end{align} \]
Hence
\[ \begin{align} {SK}_{C_{i,J}D_{I}}\left( {\widehat{C}}_{i,J}^{CC} \right) &= \sum_{j = I  i + 1}^{J}{SK\left( X_{i,j} \right)} \\ &\quad  \nu_{i}^{3}\left( \sum_{j = I  i + 1}^{J}\left( {\widehat{\gamma}}_{j}^{raw}  q\gamma_{j} \right) \right)^{3} \end{align}\]
By definition,
\[SK\left( X_{i,j} \right) = \left( \upsilon_{i}q \right)^{\frac{3}{2}}t_{j}^{3}\]
As for the second element of the equation, we have
\[ \begin{align} \biggl( \sum_{j = I  i + 1}^{J}\bigl( {\widehat{\gamma}}_{j}^{raw}  q\gamma_{j} \bigr) \biggr)^{3} &= SK\biggl( \sum_{j = I  i + 1}^{J}{\widehat{\gamma}}_{j}^{raw} \biggr) \\ &= \sum_{j = I  i + 1}^{J}{\frac{1}{\bigl( \upsilon_{\bigl\lfloor I  j \bigr\rfloor} \bigr)^{3}}\sum_{l = 0}^{I  j}{SK\bigl( X_{l,j} \bigr)}} \\ &= \sum_{j = I  i + 1}^{J}{\frac{1}{\bigl( \upsilon_{\bigl\lfloor I  j \bigr\rfloor} \bigr)^{3}}\sum_{l = 0}^{I  j}{\bigl( \upsilon_{l}q \bigr)^{\frac{3}{2}}t_{j}^{3}}} \\ &= \sum_{j = I  i + 1}^{J}\frac{q^{\frac{3}{2}}t_{j}^{3}}{\bigl( \upsilon_{\bigl\lfloor I  j \bigr\rfloor} \bigr)^{\frac{3}{2}}} \end{align}\]
Finally, we get
\[ \begin{align} {SK}_{C_{i,J}D_{I}}\bigl( {\widehat{C}}_{i,J}^{CC} \bigr) &= \bigl( \upsilon_{i}q \bigr)^{\frac{3}{2}}\sum_{j = I  i + 1}^{J}\\ &\quad \quad {t_{j}^{3}\biggl( 1  \frac{\upsilon_{i}^{\frac{3}{2}}}{\bigl( \upsilon_{\bigl\lfloor I  j \bigr\rfloor} \bigr)^{\frac{3}{2}}} \biggr)} \end{align} \tag{3}\]
6. Skewness of the CC method over all accident years
Having estimated the skewness of the CC method by accident year, we will now aggregate these elements over all accident years. In order to do so, we will use the Fleishman polynomials (Fleishman 1978). First, we assume that the centralized and normalized copy of the risk value X_{i} of the ith class, (where CoV denotes the coefficient of variation), is estimated by the Fleishman polynomial structure of a standard normal random variable. In particular, we consider the following case:
– where denotes the standard normal distribution – Such a case is suitable for estimating the skewness of a risk portfolio profile when the confidence level is approximated using skewness only.
The coefficients of the polynomial P_{2} are calibrated using the method of moments by matching the second and third moments of P_{2}(Z_{i}) to 1 (standard deviation of
λ_{i} (skewness of X_{i} ) respectively.The coefficients of P_{2} can be analytically expressed by solving the following system of equations:
\[ \left\{\begin{array}{c} 1=a_{i}^{2}+2 b_{i}^{2} \\ \lambda_{i}=6 a_{i}^{2} b_{i}+8 b_{i}^{3} \end{array}\right. \tag{4} \]
The system (4) is reduced to:
\[ \left\{\begin{array}{l} a_{i}=\sqrt{12 b_{i}^{2}} \\ \lambda_{i}=6 b_{i}4 b_{i}^{3} \end{array}\right. \tag{5} \]
Such system is easily solved. The roots of the cubic equation (5) can be found using the Cardano’s formula (e.g., Abramowitz and Stegun 1972). If we denote
\[\varphi = arccos\left(  \frac{\lambda_{i}}{\sqrt{8}} \right)\]
then the only real root of equation (5) is
\[b_{i} = \sqrt{2}\ cos\left( \frac{\varphi}{3} + 4\frac{\pi}{3} \right).\]
Having estimated the above parameters of the Fleishman polynomial, we define the total reserve value across the portfolio of m risks as
\[X_{\Sigma} = \sum_{i = 1}^{m}X_{i}\]
where each ith risk value is approximated by Fleishman polynomial of a standard normal random variable
\[X_{i} \approx {CE}_{i}\ \left( 1 + {CoV}_{i}\ P_{2}\left( Z_{i} \right) \right),\]
where CE denotes the central best estimate of X.
It is clear that
As in Mack (2008), all the standalone risks interact between each other according to a Gaussian dependence structure which linear correlations ρ_{ij} (coefficients of a Gaussian copula) are given by
\[\rho_{ij} = \frac{{\widehat{z}}_{n + 1  j}\left( 1  {\widehat{z}}_{n + 1  i} \right)}{{\widehat{z}}_{n + 1  i}\left( 1  {\widehat{z}}_{n + 1  j} \right)}.\]
SKEWNESS
We compute the third central moment of X_{Σ}:
\[E\left\lbrack \left( X_{\Sigma}  {CE}_{\Sigma} \right)^{3} \right\rbrack = E\left\lbrack \left( \sum_{i = 1}^{m}{\sigma_{i}\ P_{2}\left( Z_{i} \right)} \right)^{3} \right\rbrack\]
\[ \begin{aligned} E\bigl[\bigl(X_{\Sigma}C E_{\Sigma}\bigr)^{3}\bigr] &=\sum_{i=1}^{m} \sigma_{i}^{3} \lambda_{i}\\ &\quad +3 \sum_{i j} \sigma_{i}^{2} \sigma_{j} \\ &\quad \quad E\bigl[P_{2}\bigl(Z_{i}\bigr)^{2} P_{2}\bigl(Z_{j}\bigr)\bigr] \\ &\quad +6 \sum_{i j k} \sigma_{i} \sigma_{j} \sigma_{k} \\ &\quad \quad E \bigl[P_{2}\bigl(Z_{i}\bigr) P_{2}\bigl(Z_{j}\bigr) P_{2}\bigl(Z_{k}\bigr)\bigr] \end{aligned} \tag{6} \]
where
as the Fleishman polynomial coefficients are calibrated so that the polynomial has skewness λ_{i} for ith standalone risk profile. In formula (6), the summation term with multiple 3 has different subterms, and the summation term with multiple 6 is relevant if and has different subterms.The following components of formula (6) above are
\[ \begin{align} E\bigl\lbrack {P_{2}\bigl( Z_{i} \bigr)}^{2}P_{2}\bigl( Z_{j} \bigr) \bigr\rbrack &= 2\rho_{ij}\\ &\quad \bigl( 2a_{i}a_{j}b_{i} + \bigl( a_{i}^{2} + 4b_{i}^{2} \bigr)b_{j}\rho_{ij} \bigr) \end{align} \tag{7} \]
\[ \begin{align} E\bigl\lbrack P_{2}\bigl( Z_{i} \bigr)P_{2}\bigl( Z_{j} \bigr)P_{2}\bigl( Z_{k} \bigr) \bigr\rbrack &= 2\bigl( a_{j}a_{k}b_{i}\rho_{ij}\rho_{ik} \\ &\quad \quad + a_{j}a_{i}b_{k}\rho_{jk}\rho_{ik} \\ &\quad \quad + a_{i}a_{k}b_{j}\rho_{ij}\rho_{jk} \bigr) \\ &\quad + 8b_{i}b_{k}b_{j}\rho_{ij}\rho_{ik}\rho_{jk} \end{align} \tag{8} \]
The skewness λ_{Σ} is then calculated as follows:
\[\lambda_{\Sigma} = \frac{E\left\lbrack \left( X_{\Sigma}  {CE}_{\Sigma} \right)^{3} \right\rbrack}{\left( {CE}_{\Sigma}\ {CoV}_{\Sigma} \right)^{3}}. \tag{9}\]
In Table 1, the parameters a_{i}, b_{i} are provided, the correlation matrices are provided in Appendix A and the values for σ_{i} correspond to the msep(R_{i}) are estimated using Saluz (2015) formulae. Details of calculations related to the numerical example can be found on the excel sheets and in the R program in the folder available at:
https://drive.google.com/drive/folders/14UNUPb1a0A_YNe0Y4gTDdhEsPnOaIxS?usp=sharing
7. Numerical examples
Equations (3) and (9) were tested on one triangle and are shown in Table 1. The correlation matrices used for applying the Fleishman polynomials are also provided in Appendix A.
As expected, the skewness is positive due to the fact that there is a lower probability for the reserves to be underestimated, i.e., the right tail of the reserving risk distribution is heavier than the left tail. However, with such a distribution, in the case where the reserves are underestimated, the underestimated reserve amount will be bigger when compared to the case of a distribution having a lower skewness.
In order to benchmark the above skewness, it is compared against a few models used in practice. The first benchmark will consist in the use of parametric distributions where means and standard deviations are matched to the calculated means and standard deviations. The second benchmark will use the usual stochastic reserving method: bootstrapping. The third benchmark will consist in estimating the skewness on the basis of the Mack chain ladder model available in the R package ChainLadder (Gesman et al. 2022).
Benchmark 1: Parametric distributions
As often in practice, the distribution of reserves is approximated with a gamma or lognormal distribution, matching to the mean and standard deviation. For a gamma distribution, we would have
\[Skewness = 2\ \times Coefficient\ of\ variation\]
and for lognormal,
\[\begin{align} Skewness &= \left( 3 + \ {Coefficient\ of\ variation}^{2} \right)\ \\ &\quad \times Coefficient\ of\ variation. \end{align}\]
With such approximation, it is possible to compare the resulting shape of the distribution resulting from the Cape Cod model with gamma or lognormal distributions. And, in the example above, the estimated skewness is 0.484, which is almost seven times the coefficient of variation of 7.3%, indicating a distribution much more skewed than a lognormal distribution. In the triangle, it can be identified that this higher skewness comes from the accident year 0 which seems to have a different behavior from the other accident years—in particular, on development year 1, where the payment amount of 3 721 237 (Appendix A) would seem to be unusually high. As for any reserving work, further investigation on this particular cell would need to be done to finalize the distribution model.
Benchmark 2: Bootstrapping
For this benchmark, the function BootChainLadder of the R package ChainLadder (Gesman et al. 2022 and the R program provided) is used on the triangle shown in Appendix A (note, however, that the triangle in Appendix A is first converted into a cumulative payment triangle). The function BootChainLadder corresponds to the implementation of the bootstrapping technique described in England and Verrall (2002). For comparison purposes, two process distributions are tested: gamma and overdispersed Poisson. Results of the bootstrapping are shown in Table 2.
The bootstrapping results indicate that the overall coefficient of variation and the overall level of IBNR seem consistent with the results of the Cape Cod method even though the level of IBNR is smaller in the bootstrapping method. However, the coefficients of variation per accident year are increasing significantly from accident year 9 to accident year 1, even leading to some counterintuitive results; for example, the standard deviation on accident year 1 is higher than the IBNR level, whereas the reserve level on older accident years is usually almost certain and the standard deviation should be small. The same increase across accident years is observed on the skewness.
Such behavior comes from the way in which bootstrapping simulations are done:
 First, the assumptions below are taken.
For each accident year i and development year j, we have:
(OverDispersed Poisson)
Or (Gamma)
 Second, residuals are estimated as:
Some adjustments on these residuals are done (see England and Verrall 2002 for details).
 Third, after resampling the adjusted residuals (resulting in a new triangle is created as per the formula:
 Fourth, on the basis of this new triangle, the overall IBNRs are estimated, giving a full distribution.
Considering the process applied in the bootstrapping method, the following conclusions can be drawn:

The resampling method uses the residuals of younger (and more uncertain) accident years and applies them to older accident years. This results in the counterintuitive results mentioned above for older accident years.

For the overall IBNR level and the overall standard deviation, the method could be valid, as assumptions on standard deviations and expected value are set at the beginning of the process.

However, for skewness, there is no guarantee that the bootstrapping method can provide reliable results, as there is no assumption set at the beginning of the process for the third moment of the distributions. Interestingly, it could be a nice way to refine the bootstrapping method.
As a conclusion, the bootstrapping benchmark may not be a reliable source for comparison.
Benchmark: Mack chain ladder
The final benchmark is the Mack chain ladder as implemented in the function Quantile of the R package ChainLadder (Gesman et al. 2022 and the R program provided). This method is also applied on the cumulative triangle. Results are shown in Table 3.
The Mack chain ladder results indicate that the overall coefficient of variation and the overall level of IBNR seem consistent with the results of the Cape Cod method, even though the level of IBNR is smaller. In addition, the skewness is higher than the Cape Cod skewness. Contrary to the bootstrapping method, the skewness per accident year lessens as accident year becomes older like in the Cape Cod method and also as anticipated. In terms of comparison, the lower skewness and coefficient of variations of the Cape Cod method is certainly explained by the smoothing effect of the Cape Cod method compared to the chain ladder method. In fact, the Cape Cod method as well as the BornhuetterFerguson method take into account premium information so as to smooth the effect of payments being higher than anticipated or lower than anticipated.
As an overall conclusion, the results provided by the Cape Cod skewness seem to fit well within the different benchmarks:

It is above the skewness coming from the parametric distributions gamma and lognormal and it is possible to identify the reason for this difference.

It is below the Mack chain ladder skewness due to the smoothing nature of the Cape Cod method.
Finally, it may be worth mentioning that the ultimate loss ratios resulting from the application of the Cape Cod method reduce from the low 70s in the older accident years to the mid 60s in the younger accident years. When applying the skewness formulae, it is important to check that the overall ultimate loss ratios are stable enough so that the method can be applied. If the resulting Cape Cod loss ratios are too volatile, the skewness estimator will certainly not provide a reliable information.
8. Conclusion
This paper is a first attempt to estimate the skewness of the CC method. It does not rely on assumptions derived from external knowledge of the modeled reserving risk. As a result, the estimation is automatic and may not work on any triangle. In particular, the skewness estimation method assumes a constant loss ratio across the accident years. Therefore, triangles reflecting a high volatility of the loss ratio in the past may not be fit for this method. It must be born in mind that, in such cases, the Cape Cod reserving method is also usually not applicable for reserve estimation.
However, the advantages of this method are:
1 – It is distributionfree: there is no need to assume any distribution of the reserving risk to get an estimation of the skewness.
2 – It is easy to implement and the proposed formulae are simple and elegant.
As a next step, we must recognize also that the CC method is usually used together with the chain ladder method and the BornhuetterFerguson method. Therefore, as for the hybrid chain ladder method (Arbenz and Salzmann 2014), a skewness formula for the mixed CC / BornhuetterFerguson / chain ladder method should be developed in a subsequent paper.