1. Multiplicative and additive reserving methods
Halliwell (2007) noticed that the age-to-age development on incurred real data mostly yields a regression line with a positive intercept, which he comprehends as a “bias” of the chain-ladder projection. In this paper, we propose that this positive intercept is caused by newly reported claims, i.e., claims which were not reported in the previous year, the so-called IBNYR claims (incurred but not yet reported). These newly reported claims amounts, also denoted as “true IBNR,” are usually considered together with the changes in the amounts of the already reported claims, denoted as IBNER (incurred but not enough reported claims). It seems obvious that the amounts of newly reported claims depend on the volume of the business, e.g., the premium, rather than on the previously reported claims. The volume of the considered business is usually rather stable, at least compared with the reported claims. Assuming this volume to be constant, the observed positive intercept may well be considered an estimate of the average amount of newly reported claims, because within the database, i.e., the loss triangles of incurred claims, the newly reported claims amounts are, as already mentioned, commonly summed up with the newly estimated amounts of the previously reported claims. Therefore, an affine model, i.e., a model with an additive as well as a multiplicative part for the age-to-age development, better corresponds to the real circumstances of this development than, for instance, the purely multiplicative chain-ladder model.
In general, one distinguishes between additive or multiplicative reserving models: additive models refer the estimated claims development to the volume of the considered business, which in practice mostly is considered to be proportional to the premium volume, whereas multiplicative methods refer to the reported claims amounts.
Multiplicative methods usually are much more unstable, in particular if claims are reported rather late, such that for some smaller portfolios there may in fact be no reported claims at all for the latest accident years, as is the case in the example of Brosius (1993). In this case, the multiplicative chain-ladder model yields, for these accident years, the amount of zero as estimation for all development years. Positive reserve estimation demands some reported claims; therefore, the estimated reserve may dramatically change as soon as some claims are reported. In the Brosius case, the prediction error for the chain-ladder model even becomes infinitely large: indeed, the multiplicative model cannot cope with the age-to-age development of zero claims amounts into any positive claims amounts, and the usual formulae for the prediction error are not defined in this case. However, for a series of preceding-year claims amounts converging to zero with a given positive next-year claims amount, the corresponding prediction errors will be arbitrarily large. This can be interpreted as an infinite prediction error for the chain-ladder model.
Compared to multiplicative methods, additive methods are commonly much more stable and may also be applied in the case of triangles with a rather unsmooth, inhomogeneous pattern such as the Brosius example. As mentioned above, this is essentially due to the fact that the premium volume for the different accident years is much more stable than the reported claims amounts. With the proposed affine method, which considers a multiplicative as well as an additive relation, one gains the advantages of both model types: affine models are more stable than the chain-ladder models and may therefore also be applied to more fragmented triangle patterns. Moreover, some highly developed theoretical tools such as the famous Mack formulae for the prediction error of chain-ladder reserve estimates may be generalized to the proposed affine model. And, above all, the affine age-to-age development corresponds rather well with the reality of the data collected. Finally, the additive part of the affine models enables us to explain the aforementioned upward bias of the chain-ladder projection. By ignoring this bias and by using purely multiplicative methods, one overestimates the leverage of the reported claims on the estimated claims reserve instead of considering a supplementary additive component depending on a constant, the development of premiums, or any other appropriate notion of exposure.
In this paper, we propose various Gauss-Markov predictors for the reserves, including the chain-ladder model. Our models may be understood as generalizations of the chain-ladder model, rendering reserve estimation more stable in practice, especially for recent accident years with little or no experience. We develop these predictors as well as their standard error within one single framework known from multivariate statistics and obtain the prediction error derived by Mack (1993) for the chain-ladder model. Although, as in the well-known purely multiplicative chain-ladder method, no a priori assumptions such as a presumed claim ratio are required, these affine methods are considerably more stable with regard to fluctuations in recent accident years.
Venter and Zehnwirth (1998) and Barnett and Zehnwirth (2000) proposed similar affine methods to the ones discussed in this paper. They realized the importance of the additive component in comparison with pure chain-ladder projections by analyzing real incurred data taken from several business sectors. Ludwig and Schmidt (2010) suggested several Gauss-Markov predictors, with their so-called “combined model” integrating additive and multiplicative components as we do here with our affine model. For this reason some technical aspects may seem similar to Ludwig and Schmidt (2010). However, there is a fundamental difference between these approaches. The combined models described in Ludwig and Schmidt (2010) do not specifically include the chain-ladder method, because their multiplicative age-to-age development always depends on the claims in the first year and not—as in the chain-ladder as well as in our models—on the actual previous year.
2. Model structure
2.1. Age-to-age development
Let the vectors Xj and
denote the incurred claims of development year j and j + 1, respectively, for the accident years 1 to n − j, and let vj be a volume function, e.g., the written premium, in the corresponding accident year 1 to n − j,Xj=(X1,j⋮Xn−j,j),X′j+1=(X1,j+1⋮Xn−j,j+1),vj=(V1⋮Vn−j).
We assume the volume function vj to be given and we consider two different types of age-to-age developments. In the first case, the development only depends on the incurred claims of the previous year, whereas in the second case it simultaneously depends on both previous-year claims and the volume function. For the description of the latter case, we introduce the matrices
X∗j=(V1X1,j⋮⋮Vn−jXn−j,j) composed of previous-year claims and the volume function.
Let the age-to-age development be defined by a deterministic part fj* and a stochastic part j, where
f∗j=(cjfj) is the deterministic part and
εj=(e1j⋮en−j) with (E[e1j]⋮E[en−j])=(0⋮0) is the random part.
Depending on the different assumptions for the relevant parameters, we will consider three different types of models for the age-to-age development:
First, we have
the affine model, with the two development parameters of fj*, where the additive parameter cj defines a development proportional to the volume of accident year j. The parameter fj describes a development proportional to the previous-year claims and is therefore conceived as a multiplicative component of the age-to-age development.Within the theory of general linear models, fj* is called design matrix. In the affine model, the design matrix has two parameters, the multiplicative fj and the additive cj.
In addition, we also consider the cases with only one parameter:
-
the multiplicative model, and
-
the additive model, hereinafter referred to as “incremental loss ratio method.”
Since the expected values of the random part are supposed to be 0, the non-random part defines the age-to-age development of the conditional expected value of claims:
-
affine model
-
multiplicative model
-
incremental loss ratio method
2.2. The random part of the age-to-age development
The random part
defines the covariance matrix, which is composed of a scalar component and a matrix W specifying the structure,cov(εj⋅εtj)=σ2j⋅W=σ2j⋅(W11…W1n−j⋮⋮Wn−j1…Wn−jn−j)
Here, we only consider diagonal matrices W, i.e.,
(W11…W1n−j⋮⋮Wn−j1…Wn−jn−j)=(W110⋯00W22⋯0⋮⋮⋱⋮00⋯Wn−j).
In addition, we assume two different types of diagonals:
-
the constant diagonal with Wii = 1, meaning that
i.e., the unit matrix of size n − j, and -
the diagonal proportional to the previous-year claims Wii = Xi,j, i.e., the claims of development year j, for accident years 1 ≤ i ≤ j.
The diagonality of the matrix W means that the claims developments of different development years are independent.
The variance of the conditional random variable of the claims in development year j given the previous-year amount will then lead to
-
if W has a constant diagonal and
-
if W has a diagonal proportional to the previous-year claims.
The different assumptions about the covariance matrix correspond to different risk-measure models and result in different estimations even beyond the non-random age-to-age development parameters. With the diagonal of W being constant, all variations of the subsequent year will be valued equally. Alternatively, if the diagonal is proportional to previous-year claims amounts, variations in the subsequent year are more probable for higher previous-year claims, and these data have therefore less weight than the data for smaller claims amounts. This is justified by risk-theoretical reflections. Higher previous-year claims are expected to vary more in their subsequent year development than smaller ones. By assuming that the variance is proportional to the previous-year amount itself and not to the square of it, one even takes a diversification effect into account. This diversification is based on the assumption that higher claims amounts for a given accident year are expected to be composed of a higher number of single claims. Assuming the single claims composing the claims amounts to be equally distributed for all accident years, the number of single claims is expected to be proportional to the claims amounts. In most common models for the distribution of the number of claims, such as in the Poisson distribution, the variance is proportional to the expected value itself. This justifies the assumption that the variance in the age-to-age development is proportional to the previous-year claims.
2.3. The “memorylessness” assumption for the age-to-age development process
We assume that the age-to-age development process is “memoryless” in the following sense: the probability distribution of the claims development at age j + 1,
only depends on the claims amounts at the preceding age j, Xk,j, and does not depend on former claims developments Xk,i at age i, i j. Within the additive and affine models, may of course also depend on the volume Vk.Typical examples of such “memorylessness” are Markov processes, which for instance may be applied to construct life contingencies. We need this assumption in order to be able to link in a feasible way the insights into the single-year developments with the required multi-year development and thus to complete the triangle for the entire period. The memorylessness assumption ensures that the expected value and the prediction error of the ultimate claims development Xk,n behave as we would expect them to do:
-
The expected value of Xk,n corresponds to the iterated application of the estimated affine age-to-age development from
the latest of the known claims amounts Xk,j, j ≤ n − k + 1. -
The prediction error M̂SEP of the estimated reserve corresponds to the sum of the prediction errors M̂SEPj of the estimated claims amounts of the development year j + 1, given the claims amounts at development year j, multiplied by
where denotes the product of the remaining estimated multiplicative projection factors from year j + 1 up to year n. Thus the prediction error for the entire development corresponds to the sum of the prediction errors of the age-to-age developments, scaled up to the level of the claims amounts of the final development year.
Thus, this paper discusses two rather different topics:
-
First, the age-to-age development based on the widely used techniques of multivariate statistics. These methods are very useful to gain all kinds of insights into some given data; they rely on a powerful mathematical theory and may be expressed in a concise way using matrix calculus. Up to and including Section 6, only this one-year case will be considered.
-
Second, we need to link these age-to-age developments to the required multi-year development. The kind of reflection used here is much less common than the multivariate statistics discussed under the first topic. As mentioned before, these considerations essentially only prove that this linking may be done in a rather obvious way. Nevertheless, some subtle reflections based among others on the law of total expectations as introduced by Mack (1993) are needed. Sections 7 and 8 study this conclusion from the one-year to the multi-year case.
3. Consideration of the different models
We will not treat all six possible models in detail in this paper. As an additive development we will only consider the case where the constant variance assumption holds, i.e., Wii = 1, and as a multiplicative development the chain-ladder case with the proportional assumption, i.e., Wii = Xi,j. Instead we will look more closely at the affine models. In doing so, we will also consider affine models with constant volume, e.g., V ≡ 1 for all accident years. This model is particularly interesting in case no volume function is available within the data. In such a case, we strongly recommend comparing the affine models with a constant volume assumption to the results of a purely multiplicative method, as chain ladder is, especially if no volume function is available. If the results of these two models differ significantly, the multiplicative method—that is, the chain-ladder method—might not be appropriate for the given triangles. Table 1 shows the main assumptions in the different models.
4. The weighted least squares estimators of the parameters
Usually, one distinguishes between two cases. In either case, W is a diagonal matrix, i.e., all off-diagonal entries of W are 0. In the special case corresponding to our constant-risk-measure assumption, all diagonal elements are equal to 1, whereas in the general case, there is no such restriction.
First of all, the special case is treated within regression analysis and the parameters are estimated by the least squares estimator. Then the theorem of Gauss-Markov shows that these estimators are the best linear unbiased estimators (BLUE). The special case presuming the diagonal entries of W to be constant is called the homoscedasticity model, as opposed to the more general heteroskedastic model with different diagonal entries. The diagonal entries of the heteroskedastic model can be understood as different weights. Therefore, one takes into account these different weights Wii−1 = Xi,j−1 within the least squares estimators and evaluates a weighted least square estimator. Hence the general heteroskedastic case may be reduced to the special case, meaning that in the general case the theorem of Gauss-Markov still holds. This is why the weighted least squares estimator is also the best linear unbiased estimator.
Let us introduce the notation
for the estimated claims amounts of the subsequent year. We seek minimizing the weighted squared differences WSQ between the estimated and the observed claims development. WSQ may be expressed by matrix calculation, taking into account the weights by the inverse matrix of W. Thus the components with high diagonal entries Wii = Xi,j will have less impact on parameter estimation than those with low diagonal entries:WSQ=(ˆX′j+1−X′j+1)tW−1(ˆX′j+1−X′j+1)=(X∗j⋅ˆf∗j−X′j+1)tW−1(X∗j⋅ˆf∗j−X′j+1).
We now determine
such that WSQ becomes minimal. Hence the two partial derivatives of WSQ with respect to the two parameters must be set equal to zero. Since WSQ is a quadratic expression, the derivatives have a linear and a constant term, which, in matrix terms, leads to the following equation:(X∗j)t⋅W−1⋅X∗j⋅ˆf∗j=(X∗j)t⋅W−1X′j+1.
This results in the weighted least squares estimators
ˆf∗j=((X∗j)t⋅W−1⋅X∗j)−1⋅(X∗j)t⋅W−1⋅X′j+1,
where the matrices in question are presumed to be invertible, as is assumed for the remainder of this paper, even if not specifically mentioned. The relations used in this article are mostly well known in multivariate statistics and have been described in textbooks (Fahrmeir, Hammerle, and Tutz 1996, for instance, or Halliwell 2007).
A table with the specific formulae for the weighted least squares estimators for the parameters in the different models, as well as the derivation of these formulae, can be found in Appendix A.
5. The error of the weighted least squares estimator
In Sections 5 and 6, as in the two previous sections, we only consider one single development year, i.e., the development from year j to year j + 1. We suppose the claims up to the development year j to be known and therefore consider the conditional probabilities given the history
Hj={Xi,l,i+l≤n+1,l≤j}.
The volumes Vj are not relevant for the history considered, since they are not regarded as stochastic variables in this model. Since in these sections we always study conditional probabilities given the history Hj, this may be omitted to improve readability, especially for the matrix formulae.
To simplify the notation even further, we set A = ((Xj*)t · W−1 · Xj*)−1. Then
cov(ˆf∗j,ˆf∗j)=cov(A(X∗j)tW−1εj,A(X∗j)tW−1εj)=cov(A(X∗j)tW−1εj(A(X∗j)tW−1εj)t)=cov(A⋅(X∗j)tW−1εj⋅εtj(W−1)t⋅X∗jAt)=σ2⋅A⋅(X∗j)t(W−1)t⋅X∗j⋅At=σ2⋅A⋅(A−1)t⋅At=σ2⋅A=σ2⋅((X∗j)t⋅W−1⋅X∗j)−1.
We are now interested in the impact of the parameter estimation error on the estimation of the age-to-age claims development. This development starts with the sum
of the estimated previous-year claims for all accident years which have not yet been developed and therefore have to be estimated. This sum corresponds to the sum of the rows beyond the diagonal and includes the diagonal itself, i.e., because these claims are already included in the data.We set Zj=j⋅(¯ˆVjˆXj)=(∑nk=n−j+1VjXn−j+1,j+∑nk=n−j+2ˆXk,j)=(∑nk=n−j+1Vj∑nk=n−j+1ˆXk,j),
thus defining
The notation with the hat may be misinterpreted: we do not estimate new volumes here; this notation merely suggests that we take the average over the remaining accident years for the claims inThe so-called parameter error
ErrParam j=E(n∑k=n−j+1ˆXk,j+1−E[n∑k=n−j+1Xk,j+1])2
may be estimated using the theory of generalized linear models. This estimator is calculable as a product of the matrices in question, that is,
ˆErrParam j=cov(Zj⋅ˆf∗j,Zj⋅ˆf∗j)=Ztj⋅cov(ˆf∗j,ˆf∗j)⋅Zj=ˆσ2j⋅Ztj⋅((X∗j)t⋅W−1⋅X∗j)−1⋅Zj.
The process variance describes the randomness of the process itself, and here we look directly at the effect of this process randomness on the development of the sum of claims of the accident years not yet given in the claims triangle. Therefore, we capture the effect of the process randomness of the development from year j to j + 1 on the reserve
VarProcess j=E[(n∑k=n−j+1Xk,j+1−E[n∑k=n−j+1Xk,j+1])2]
with the estimation
^VarProcess j=ˆσ2j⋅{j for constant risk measure j⋅ˆˆXj for proportional risk measure ,
since the above sum is composed of the claims of j accident years which are supposed to be independent as the covariance matrix is diagonal. The history Hj up to development year j defines the estimates j ·
Putting both together, we obtain the mean squared error of the predicted sum of claims for the development from year j to year j + 1, given the history Hj defining again the required estimates j ·
:MSEPj=E[(n∑k=n−j+1ˆXk,j+1−n∑k=n−j+1Xk,j+1)2]=VarProcess j+ErrParam j
with estimator
ˆMSEPj=^VarProcess j+ˆErrParam j=ˆτj⋅ˆσ2j, where
\begin{aligned} & \hat{\tau}_j=j \cdot\left\{\begin{array}{ll} 1 & \text { for constant risk measure } \\ \hat{\hat{X}}_j & \text { for proportional risk measure } \end{array}\right\} \\ &+j^2 \cdot\binom{\overline{\hat{V}}_j}{\overline{\hat{X}_j}}^t \cdot\left(\left(X_j^*\right)^t \cdot W^{-1} \cdot X_j^*\right)^{-1} \cdot\binom{\overline{\hat{V}}_j}{\overline{\hat{X}_j}} . \end{aligned}
Note that the last factor
is generally not defined in the models with two parameters. In the numerical examples below, we therefore setA table with the specific formulae for the prediction error within the different models, as well as the derivation of these formulae, can be found in Appendix B. The well-known calculation method for the estimator
of the process randomness is described in Appendix D.6. A simple calculation method provided by standard spreadsheet applications
Estimates based on linear regression are much more common and accessible in computational tools such as spreadsheets. Hence practical calculations may be facilitated if it is possible to link the computations in the chain-ladder models to those in the linear regression models. In fact, this can be achieved by a simple transformation of the data, which we here regard as coordinates. Thus the previous-year claims amounts and the volume function represent the independent variables and are interpreted as x-coordinates, whereas the next-year claims amounts represent the dependent variables viewed as y-coordinates. More specifically, the data in the generalized chain-ladder model,
X_{j}^{*}=\left(\begin{array}{cc} V_{1} & X_{1, j} \\ \vdots & \vdots \\ V_{n-j} & X_{n-j, j} \end{array}\right) \text { and } X_{j+1}^{\prime}=\left(\begin{array}{c} X_{1, j+1} \\ \vdots \\ X_{n-j, j+1} \end{array}\right) \text {, }
are transformed, by dividing each row by
to the independent variables\tilde{X}_{j}^{*}=\left(\begin{array}{cc} \tilde{V}_{1} & \tilde{X}_{1, j} \\ \vdots & \vdots \\ \tilde{V}_{n-j} & \tilde{X}_{n-j, j} \end{array}\right)=\left(\begin{array}{cc} V_{1} / X_{1, j}^{1 / 2} & X_{1, j}^{1 / 2} \\ \vdots & \vdots \\ V_{n-j} / X_{n-j, j}^{1 / 2} & X_{n-j, j}^{1 / 2} \end{array}\right),
considered as x-coordinates in the linear regression model. By analogy, in the traditional chain-ladder model without an additive component, we get
\tilde{X}_{j}=\left(\begin{array}{c} \tilde{X}_{1, j} \\ \vdots \\ \tilde{X}_{n-j, j} \end{array}\right)=\left(\begin{array}{c} X_{1, j}^{1 / 2} \\ \vdots \\ X_{n-j, j}^{1 / 2} \end{array}\right) .
For both the generalized and the traditional chain-ladder models, the dependent variables then are
\tilde{X}_{j+1}^{\prime}=\left(\begin{array}{c} X_{1, j}^{1 / 2} \cdot X_{1, j+1} / X_{1, j} \\ \vdots \\ X_{n-j, j}^{1 / 2} \cdot X_{n-j, j+1} / X_{n-j, j} \end{array}\right)=\left(\begin{array}{c} X_{1, j+1} / X_{1, j}^{1 / 2} \\ \vdots \\ X_{n-j, j+1} / X_{n-j, j}^{1 / 2} \end{array}\right),
which are now considered as y-coordinates in the linear regression model.
The weighted squared differences WSQ in the generalized chain-ladder (CL) model equal the WSQ in the linear regression (LR) model based on the transformed coordinates. Therefore, the estimation problems in the chain-ladder models may be reduced to a linear regression problem and solved accordingly by the multiple tools available:
\begin{array}{l} \operatorname{WSQ}\left(X_{j}^{*}, X_{j+1}^{\prime}, W(C L)\right) \\ =\left(X_{j}^{*} \cdot \hat{f}_{j}^{*}-X_{j+1}^{\prime}\right)^{t} W(C L)^{-1}\left(X_{j}^{*} \cdot \hat{f}_{j}^{*}-X_{j+1}^{\prime}\right) \end{array}
\begin{array}{c} =\left(X_{j}^{*} \cdot \hat{f}_{j}^{*}-X_{j+1}^{\prime}\right)^{t}\left(\begin{array}{ccc} X_{1, j}^{-1} & \cdots & 0 \\ 0 & \ddots & 0 \\ 0 & \cdots & X_{n-j, n-j}^{-1} \end{array}\right) \\ \left(X_{j}^{*} \cdot \hat{f}_{j}^{*}-X_{j+1}^{\prime}\right) \end{array}
\begin{array}{c} =\left(\tilde{X}_{j}^{*} \cdot \hat{f}_{j}^{*}-\tilde{X}_{j+1}^{\prime}\right)^{t}\left(\begin{array}{ccc} 1 & \cdots & 0 \\ 0 & \ddots & 0 \\ 0 & \cdots & 1 \end{array}\right)\left(\tilde{X}_{j}^{*} \cdot \hat{f}_{j}^{*}-\tilde{X}_{j+1}^{\prime}\right) \\ =\operatorname{WSQ}\left(\tilde{X}_{j}^{*}, \tilde{X}_{j+1}^{\prime}, W(L R)=I_{n-j}\right) . \end{array}
Panning (2005) proposed to introduce additional “dummy” variables: primarily you augment the dependent variable with a zero such that you have the same number of rows as there are independent variables, including the last row with the x-coordinates you wish to estimate. Thereafter you introduce a “dummy” column with all entries zero except the last one, which will be set to −1. This leads to the following matrices and vectors, respectively, for our main models:
Panning (2005) introduced a “dummy” column for each accident year k = n − j + 1 up to n, thus estimating each accident year separately. Here, we are only interested in the prediction error of the sum of the claims amounts at age j + 1 to catch the entire prediction error M̂SEPj of the development from age j to age j + 1. The “LINEST” function in Microsoft Excel recommended by Panning (2005) calculates the linear regression coefficients as well as their standard error, and therefore this function particularly lends itself to computing the prediction error M̂SEPj. Please note that the “LINEST” function arranges the columns in reverse order (cf. the remarks in Panning (2005) on this subject). Placing the regression coefficients on the first row and their standard error on the second row leads to the matrix
\left(\begin{array}{ccc} \hat{c}_{j} & \hat{f}_{j} & \sum_{k=n-j+1}^{n} \frac{\hat{X}_{k, j+1}}{\left(j \cdot \hat{\hat{X}}_{j}\right)^{1 / 2}} \\ \hat{\operatorname{Err}}{P a r}\left(\hat{c}_{j}\right)^{1 / 2} & \hat{\operatorname{Err}}^{P a r}\left(\hat{f}_{j}\right)^{1 / 2} & \hat{M} S E P_{j}^{1 / 2} /\left(j \cdot \overline{\hat{X}}_{j}\right)^{1 / 2} \end{array}\right) \tag{GCL}
in the generalized chain-ladder model, and for the chain-ladder models without an additive component we have
\left(\begin{array}{cc} \hat{f}_{j} & \sum_{k=n-j+1}^{n} \hat{X}_{k, j+1} /\left(j \cdot \overline{\hat{X}}_{j}\right)^{1 / 2} \\ \hat{E} r r^{P a r}\left(\hat{f}_{j}\right)^{1 / 2} & \hat{M} S E P_{j}^{1 / 2} /\left(j \cdot \overline{\hat{X}}_{j}\right)^{1 / 2} \end{array}\right) . \tag{CL}
This procedure therefore enables us to compute the prediction error for the chain-ladder models in a particularly simple way.
For the linear regression models, there is only one more small step to do: here the “LINEST” function applied to only one “dummy” variable does not take into account the entire process variance for the j accident years to be predicted. Thus this corresponds to a prediction error based on
\begin{array}{l} \frac{\hat{M} S E P_{j}}{\hat{\sigma}_{j}^{2}}=j(\text { instead of } 1 \text { provided by "LINEST") } \\ +\frac{j^{2}}{n-j} \frac{\left(\overline{\hat{V}_{j}}\right)^{2} \cdot \overline{X_{j}^{2}}-2 \overline{\hat{V}_{j}} \cdot \overline{\hat{X}_{j}} \cdot\left(\overline{V_{j} \cdot X_{j}}\right)+\left(\overline{\hat{X}_{j}}\right)^{2} \cdot \overline{V_{j}^{2}}}{\overline{X_{j}^{2}} \cdot \overline{V_{j}^{2}}-\left(\overline{V_{j} \cdot X_{j}}\right)^{2}} \end{array}
(see Appendix B). As the “LINEST” function in Microsoft Excel also lists the process error in the second column of the third row, M̂SEPj can easily be calculated with the two terms
and These terms are returned by the “LINEST” function, used—as in the chain-ladder cases—as an array function with the following output:\left(\begin{array}{ccc} \hat{c}_j & \hat{f}_j & \sum_{k=n-j+1}^n \hat{X}_{k, j+1} \\ \operatorname{Err}^{\operatorname{Par}}\left(\hat{c}_j\right)^{1 / 2} & \operatorname{Err}^{P a r}\left(\hat{f}_j\right)^{1 / 2} & \left(\hat{M} S E P_j-(j-1) \hat{\sigma}_j^2\right)^{1 / 2} \\ \text{*} & \hat{\sigma}_j & \text{*} \end{array}\right) . \tag{GLR}
The reasons that there is no need for the correction term
in the chain-ladder models using the array function returned by the “LINEST” function are explained in Appendix C.7. The completion of the claims triangle: Expected claims development
The usual task when projecting claims triangles is to complete these triangles to a square. Therefore we predict the development of the latest of the known claims amounts,
for the development from year n − k + 1 to year n by recursively defined estimators for j = n − k + 1, . . . , n − 1, with the notation for the initial term.The recursively defined Mack (1993) did for the chain-ladder model—using the law of total expectations:
provides an unbiased estimator for the projected claims development for the accident year k, 1 ≤ k n, based on the data given by the claims triangle D = {Xi,j, i + j ≤ n + 1}. Thus, we have which can be proved—as\begin{array}{l} E\left[\hat{X}_{k, j+1} \mid D\right]=E\left[\hat{X}_{k, j+1} \mid X_{k, n+1-k}\right] \\ =E\left[E\left[\hat{X}_{k, j+1}\left|X_{k, j}\right| X_{k, n+1-k}\right]\right] \\ =E\left[E\left[\hat{f}_{j} X_{k, j}+\hat{c}_{j} V_{k}\left|X_{k, j}\right| X_{k, n+1-k}\right]\right] \\ =E\left[\hat{f}_{j} \mid X_{k, j}\right] \cdot E\left[X_{k, j} \mid X_{k, n+1-k}\right] \\ +E\left[\hat{c}_{j} \mid X_{k, j}\right] \cdot V_{k} \\ \stackrel{(*)}{=} f_{j} \cdot E\left[X_{k, j} \mid X_{k, n+1-k}\right]+c_{j} \cdot V_{k}=f_{j} \cdot f_{j-1} \\ \text { - } E\left[X_{k, j-1} \mid X_{k, n+1-k}\right]+\left(f_{j} \cdot c_{j-1}+c_{j}\right) \cdot V_{k} \\ =\cdots \stackrel{(* *)}{=} f_{n-k: j+1} X_{k, n+1-k}+V_{k} \sum_{s=1}^{j+k-n} f_{n-k+s: j+1} \\ \text { - } c_{n-k+s}=E\left[X_{k, j+1} \mid D\right] \\ \end{array}
where
or the corresponding estimators respectively, denote the projection factor from development year a + 1 to development year b. Equation (*) is valid by the theorem of Gauss-Markov, stating that the weighted least squares estimators and are unbiased estimators, i.e., andSetting j + 1 = n in equation (**), we obtain an estimate for the ultimate claims development
from the latest available claims data for accident year k, thus the diagonal element is\hat{X}_{k, n}=\hat{f}_{n-k: n} X_{k, n+1-k}+V_{k} \sum_{s=1}^{k-1} \hat{f}_{n-k+s: n} \cdot \hat{c}_{n-k+s} .
This formula shows the importance of the additive part
for the affine models. These additive terms, in particular those for the more recent accident years, i.e., those for higher values of k, may have an important impact on the estimate of the ultimate claim development. The reason for this is that the additive parts have to be considered for each of the k − 1 remaining development years, since the formula for involves the sum of k − 1 additive terms in the estimate of the ultimate Our numerical examples at the end of this paper will underline the potential importance of these additive parts for claims triangles based on real data.8. The completion of the claims triangle: The prediction error of the reserves
In this section we combine the prediction errors for the age-to-age development in order to obtain the prediction error for the entire projection. The reserve itself depends on all these subsequent estimations, whereas the development from year j to year j + 1 only depends on the sum of the estimated previous-year claims
as defined in Section 5.We have already computed the mean squared error of prediction MSEPj for the development from year j to year j + 1 of the sum of the claims given the estimation
i.e., the sum of the predicted previous-year claims. This amounts toM S E P_{j}=\operatorname{Var}_{j}^{\text {Process }}+\operatorname{Err}_{j}^{\text {Param }}=\sigma_{j}^{2} \cdot \tau_{j},
with estimator
\hat{M} S E P_{j}=\hat{\sigma}_{j}^{2} \cdot \hat{\tau}_{j}, \text { where } \hat{\tau}_{j}\\ \text{and}\ \hat{\sigma}_{j}\ \text{depend on the chosen model.}
To obtain the error of the predicted reserve from the age-to-age development errors, we have to take into account the multiplicative development from year j + 1 to year n, because the estimation of the reserve is based on the estimated projection up to the ultimate development year n. Therefore the prediction error for the development year j has to be multiplied by the square of the previously defined projection factor
Due to our memorylessness assumption, neither the parameter error nor the variances of the process are correlated for the different development years. Therefore, the age-to-age development errors M̂SEPj, multiplied by the square of the estimated projection factor to scale them up to the level of the final development year, may simply be summed up to get the prediction error of the reserves:\begin{aligned} M S E P & =E\left[\left(\sum_{k=2}^{n} \hat{X}_{k, n}-\sum_{k=2}^{n} X_{k, n}\right)^{2}\right] \\ & =\sum_{j=1}^{n-1} M S E P_{j} \cdot f_{j: n}^{2} . \end{aligned}
The derivation of MSEP as the sum of the scaled-up age-to-age prediction errors can be found in Appendix E.
Thus, we get a similarly constructed estimator,
\hat{M} S E P=\sum_{j=1}^{n-1} \hat{M} S E P_{j} \cdot \hat{f}_{j: n}^{2}=\sum_{j=1}^{n-1}\left(\hat{\sigma}_{j} \cdot \hat{f}_{j: n}\right)^{2} \cdot \hat{\tau}_{j} .
In case of the chain-ladder model, this results in
\hat{M} S E P(C L)=\sum_{j=1}^{n-1}\left(\overline{\hat{X}_{j}} \cdot \hat{\sigma}_{j} \cdot \hat{f}_{j: n}\right)^{2}\left(\frac{j}{\overline{\hat{X}_{j}}}+\frac{j^{2}}{(n-j) \cdot \overline{X_{j}}}\right)
and for the generalized chain-ladder model (GCL) with an additive component depending on a volume, we have
\begin{array}{l} \hat{M} S E P(G C L)=\sum_{j=1}^{n-1}\left(\hat{\sigma}_{j} \cdot \hat{f}_{j: n}\right)^{2} \\ \left(j \cdot \overline{\hat{X}_{j}}+j^{2} \cdot \frac{\left(\begin{array}{c} \left(\overline{\hat{V}_{j}}\right)^{2} \cdot \overline{X_{j}}-2 \overline{\hat{V}_{j}} \cdot \overline{V_{j}} \cdot \overline{\hat{X}_{j}} \\ +\left(\overline{\hat{X}_{j}}\right)^{2} \cdot \overline{V_{j}^{2} / X_{j}} \end{array}\right)}{(n-j) \cdot\left(\overline{V_{j}^{2} / X_{j}} \cdot \overline{X_{j}}-\left(\overline{V_{j}}\right)^{2}\right)}\right) \text {. } \\ \end{array}
Mack (1993) derived the formula for the mean squared error of prediction in the chain-ladder case, M̂SEP(CL), and thus stimulated extended research into the reliability of reserve calculation based on claims triangles. His approach was different in so far as all accident years were considered separately and not together as we do here. Therefore Mack’s prediction error for the entire reserve is composed of the prediction errors for the different accident years. Because these prediction errors rely on the same parameter estimation, Mack has to take these dependencies into account. Thus, the structures of the equivalent formulae are different due to the different approach in handling the estimation error. In the development by Mack, our term for the estimation error of appears twice, once as the sum of the squares corresponding to the estimation error for the different accident years, and again as a mixed product considering the dependencies for the different accident years. Therefore, as noted below, the squared sum of our case splits into two terms in the Mack formulae and the mixed term appears when calculating the prediction error for the entire reserve from those of the individual accident years:
\begin{aligned} \left(j \cdot \hat{X}_{j}\right)^{2} & =\left(\sum_{k=n-j}^{n} \hat{X}_{k, j}\right)^{2} \\ & =\sum_{k=n-j}^{n} \hat{X}_{k, j}^{2}+2 \cdot \sum_{k=n-j}^{n-1} \hat{X}_{k, j} \cdot \sum_{l=k+1}^{n} \hat{X}_{l, j} \end{aligned}
9. Numerical examples
In this section, we apply three of our methods—the two considered affine methods (that is, the generalized chain-ladder and the generalized linear regression), and the traditional chain-ladder method—to three given claims triangles. One of these triangles corresponds to the less regular example used by Mack (1993) to illustrate his newly discovered determination of the prediction error in the chain-ladder case. The second one was published by Schnieper (1991) in a case where he explicitly gathered the data of the newly reported claims in the corresponding accident year. Within our affine methods, these amounts are estimated by the additive component of the age-to-age development. Applied to real claims data, our affine model usually yields positive additive parameters, and only exceptionally does it produce negative parameters. This shows that the affine model fits rather well to the reality of the way in which claims are handled within the triangles. The third triangle was published by Brosius (1993) and reconsidered by Halliwell (2007).
9.1. The example of Mack (1993)
There is no volume function quoted in Mack’s paper. We therefore assumed the volume to be constant for all accident years. The projection does not depend on the constant itself. By the choice of “1” for the volume in Table 2, the estimated additive development parameter may be directly interpreted as the estimated amount of newly reported claims of the corresponding development year. This example also shows that assuming a constant volume function in those cases where no such function is available within the data may lead to a more stable and more reliable projection result. Table 3 shows the estimated additive and multiplicative parameters for the triangle of Mack (1993).
Looking in particular at the ninth accident year in Table 4, huge differences in the estimated IBNR are found between the traditional chain-ladder model and the two affine models. We suggest that the chain-ladder method is not appropriate for the given data, because this method neglects the IBNYR-estimation and therefore seems not suitable, especially for the most recent accident year with a particularly small claims amount of just 13 (cf. Table 2).
In Table 5, estimates the standard error of the development from age j to age j + 1 scaled up to the level of the final-year claims amounts, and the entire standard error of the estimated reserve.
9.2. The example of Schnieper (1991)
Schnieper (1991) proposed to explicitly separate the incurred data into newly reported claims (true IBNR) and changes in the amounts of reported claims (IBNER). He assumed the expected values of the true IBNR claims to depend on a volume function and changes in IBNER claims on the incurred claims of the previous year, as in the chain-ladder case. The example in Schnieper (1991) shows that these additional data may have a considerable impact on the estimation of the reserve, although these data are not regularly collected. Assuming an affine age-to-age development of incurred claims, we propose an estimate of these two components based solely on the incurred claims available in the usual claims triangle. The additive part provides an estimate for the newly reported claims, the so-called true IBNR claims, and the multiplicative part gives an estimate for the changes in the reported claims (IBNER).
In his paper, Schnieper (1991) regards the premiums per accident year of the entire portfolio as a volume function. To facilitate interpretation, we normalize the volume via dividing the values in Table 6 by 15,000 to attain a size near to “1.”
One may interpret the product of the volume Vj and the additive development parameter ĉj as an estimation of the newly reported claims. In fact, with a volume function near to “1”, the parameters ĉj may already be regarded as a rough estimate of the newly reported claims.
Looking at Table 7, the negative values of the additive parameter for the development of year 3 to year 4 do not fit in with our interpretation that the said parameter models the newly reported claims. Nevertheless, the observation of such negative additive parameters should motivate us to look more closely and, perhaps, to collect more data, as was done in this case with the additional data of the newly reported claims shown in Table 8. Estimated reserves are shown in Table 9, and estimated standard errors in Table 10.
9.3. The example of Brosius (1993), reconsidered by Halliwell (2007)
The data of Brosius in Table 11 are based on a small book of business and consist of a claims trapezoid of seven accident and five development years. The data were completed to a triangle in order to apply the formulae mentioned above. Again, the volume is normalized to attain a size near to “1” via dividing by 10,000 in order to scale the parameters ĉj up to the order of magnitude of the newly reported claims. Table 12 shows the estimated parameters.
The risk metric in the chain-ladder model, as well as in the generalized chain-ladder model, entails an infinite probability for the age-to-age development from a claim cell zero to a positive claim cell. This results in an infinite estimated standard error in the chain-ladder case in Table 13 and indicates that the chain-ladder model is not suitable for such inhomogeneous claims triangles. At least, the reserve itself is calculable in the chain-ladder model, whereas in the generalized chain-ladder model the reserve formulae are no longer applicable, since claim cells of value zero produce zeroes in the denominator. With only one such zero-cell, the reserve could be defined in the generalized chain-ladder model by computing the development parameters for a sequence converging to zero in this zero-cell. The sequence of the development parameters will then be convergent too, meaning that the additive term is fully defined by the development of this zero-cell. However, in the present case in Table 11 with two zero-cells for a given development year, the additive term depends on the two specific sequences chosen for these two particular cells, thus even considering the value of a limit will give no precise amount for the reserve. Table 14 shows the estimated standard errors for the triangle of Brosius (1993).
10. Conclusion
In this paper, we proposed some new affine models for claims reserving and compared them to the well-known multiplicative chain-ladder model, given three sets of real claims data taken from the literature. In our view, the results show that the proposed affine models better correspond to the given claims data than the traditional chain-ladder model.
Does that mean that we have found the ultimate way to compute the exact amount of the predicted reserves and the corresponding prediction error? Not quite. As the numerical examples show, the proposed new models do indeed help to better detect patterns of claims reporting and thus to analyze what is going on in one’s business as a whole or in a specific branch. In particular for long-tail branches such as general liability or professional liability insurance, where claims might be reported years after they were incurred, the new models are more appropriate because their additive part takes into account these late reported claims and estimates the average amount to be incorporated in the claims projection, based on the claims data recorded in the past. The prediction error then provides a measure for the stability of the model assumptions in the past years of experience. Also, a fairly high prediction error may be caused by an unrealistic model that is founded on inappropriate parameters or data, and could thus provide us with a stimulus to improve the model—e.g., by using the proposed affine model instead of regular chain-ladder—or the data basis in order to be more in touch with reality.
When applying rigorous and apparently objective mathematical methods in an economic setting, one should always keep in mind how much the corresponding results depend on the choice of models and of the parameters involved. And with the mathematical methods becoming more and more developed and sophisticated, the danger of possible misapplication and implicit trust in such methods rises. For instance, example 4.a mentioned by Mack (1993) in his famous paper establishing the stochastic view in non-life reserving does not seem to be suitable for the purely multiplicative chain-ladder model without additive parameter, as applied by Mack (1993). At least, that is what is suggested by the significant additive parts revealed—and all with positive values!—when applying our affine models to the given data. Hence, not only is in this case the prediction error questionable, but also the much more important reserve estimation itself, in particular for more recent accident years. Thus it comes as no surprise that practicing actuaries are usually rather cautious towards new methods, particularly if they are regarded as highly elaborate and academic. For this reason, we will follow with great interest whether and how our proposed methods will find their way into generally accepted actuarial practice.
Acknowledgments
The author wishes to thank Richard Gorvett for his keen interest in this subject and his valuable advice, Mario Wthrich for his valuable suggestions, and Andreas Graf for his advice on the English language.