Generalized Mack Chain-Ladder Model of Reserving with Robust Estimation

Przemyslaw Sloma

1. Introduction and motivation

The provision for outstanding claims is one of the main components of technical provisions of insurance company’s liabilities. Measuring the deviation of the true amount of reserves from its estimation is one of the major actuarial challenges. Senior managers, shareholders, rating agencies, and insurance regulators all have an interest in knowing the magnitude of these potential variations (reserve uncertainty) since companies with large potential deviations need more capital or reinsurance.

One of the most known methods of reserving used in practice is the approach called chain-ladder. This method belongs to the family of development factor models (DFMs). The first stochastic approach based on the chain-ladder technique was proposed by Mack (1993, 1994). In these studies, Mack proposed the estimation of the mean square error of prediction (MSEP) of claims reserves based upon all-year volume-weighted average of loss development factors (also called: link ratios, age-to-age factors, report-toreport factors). The variance structure was supposed to be proportional to the development period’s initial loss. This assumption is sufficient for the weighted average development factors to have optimal statistical properties (BLUE, or best linear unbiased estimate). Some authors (see Murphy 1996, 188; Mack 1999, 15) pointed out that the estimation of chain-ladder factors is connected with the estimation in the framework of linear model by weighted least squares (WLS) regression approach. They also observed that by modifying the original variance assumption from Mack (1993) the corresponding estimators of chainladder development factors keep their BLUE property (see Murphy 1996; Barnett and Zehnwirth 2000; Saito 2009). It is worth underlying that the modification of variance assumption leads to different point estimators for development factors (arithmetic average, slope of regression etc.; see Remark 3.1 for more details).

One of the major challenges in everyday actuarial practice is selecting the loss development factors (LDFs).

The adjustments to make data more homogeneous are often justified for a number of reasons: unstable run-off triangles, outliers, inaccurate and incomplete data, etc.). Most actuaries use somewhat arbitrary rules of thumb in selecting the LDFs. Blumsohn and Laufer (2009) describe this topic in great detail. In this project, a group of actuaries were asked to select LDFs for an incurred run-off triangle. The important number of ways of LDFs selection was provided by the participants. The approaches proposed to evaluate the estimation of expected value of reserves varied widely and the additional information about the error of prediction of this estimation could be helpful in decision making. That is why it is extremely important from a practical point of view to have a method that provides the estimation of conditional MSEP of ultimate claims (or reserves) in the context of LDFs selection by actuaries. In the present study we provide such a tool embedded in the theoretical framework to quantify the standard error of prediction of the claims reserves in the case where some factors have been excluded from the estimation of model parameters. However, we do not judge whether these ad hoc approaches of selecting factors are correct or wrong. We rather assume that the expert judgment taken by an actuary could always be justified by his specific knowledge of considered business.

Measuring the variability of the reserves in this context is poorly developed in the literature. That is why in practice actuaries and reserving software developers often use the proxy methods based on formula for MSEP derived in Mack (1993). The approximations mainly consist of replacing the main parameters by their estimators computed by the other approach without changing the main formula. This procedure is incorrect because in the chain-ladder framework the formula for MSEP depends among others on the standard error of chain-ladder factors and it is not accurate to simply plug in the new estimators in the old formula. The other proxy method often used in practice consists of applying the coefficients of variation of ultimate loss from Mack (1993) (ratio of square root of MSEP of ultimate loss over ultimate loss) in order to derive the MSEP estimators of a new approach. It turns out that in general these approximations are highly inappropriate (see example in Section 6.3).

We think that, in some simple cases (no curve fitting for LDFs, for example) the approximations mentioned above are the consequences of bad understanding of the main formula for estimation of MSEP in Mack (1993). Moreover, the approximations used by actuaries and actuarial software developers could be avoided by using the more appropriate existing models. One such model was proposed by Mack (1999). To our knowledge, this was the first study that showed how to measure the uncertainty of reserves in the situation when an actuary selects the LDFs. In our opinion, the important results obtained in this paper are not always used in practice because, instead of explicit formula, the recursive equation is given there for estimation of MSEP of ultimate loss.

Mack (1999) is an important paper that allows the fully understanding of the MSEP formula and avoid the inappropriate approximation when it is not necessary. We summarize the details of this method in Section 3. One of the major limitations of this method is the underestimation the MSEP of ultimate loss in the case where the number of excluding data is important. We discuss this topic in detail in Section 3.5. One possible solution to overcome this difficulty is to extend the existing approach proposed by Mack (1999).

Therefore, we propose a general approach for stochastic claims reserving in the framework of chainladder model, extending the model proposed by Mack (1999) and Murphy, Bardis, and Majidi (2012). This extension is three fold. First, our general tool has a educational role and makes it possible to validate the results from other approaches. More precisely, our general formula for estimation of MSEP of outstanding loss liabilities can be used to fully understand the Mack (1993), Mack (1994), and Mack (1999) model. Furthermore, under new solvency requirements of Solvency II, insurance companies use the bootstrap-type stochastic reserving methods to determine the economic capital corresponding to the reserve risk. The bootstrap method allows estimation of a whole claims reserves distribution via resampling techniques and Monte Carlo simulations. It seems to be crucial for non-life insurance companies to be able to validate the results given by the industrial software where we do not have access to the code and when the number of shortcuts may be applied. Our approach can be used to validate the estimation of the first two moments of the loss distribution in the case where selection of development factors was employed and the different weights in estimation of chain-ladder factors and volatility parameters were used (see Section 6.3 for more details).

Second, our general Mack chain-ladder (GMCL) model can be used to construct the proxy solutions to overcome the limits of Mack’s (1999) approach, i.e., the use of the same weights for parameters estimation (see discussion in Section 3.6). This means that, for the methods where we eliminate the considerable number of observations, we reduce also the data for variability of the reserves. This mechanically impacts the estimation of MSEP of loss liabilities, which in such cases is generally underestimated. We propose then the possible solution to overcome this kind of difficulty (see Section 6.1 for more details).

Finally, the third and really important application from a practical point of view consists of bridging the point estimation of chain-ladder parameters with the theory of robust statistics. As mentioned above, the point estimators of chain-ladder factors can be obtained in the linear regression framework by applying the weighted least squares procedure. It is well known that the OLS estimators are fragile to the outliers. That is why we propose using the robust techniques of estimation such as: M-estimators, $L^{p}$ -estimators, etc (see Section 6.2).

The reminder of this paper is organized as follows. In Section 2 we present our notations and definitions. We review in Section 3 the MCL and its main limitations. In Section 4, we present the GMCL and the main results are derived in Section 5. Finally, Section 6 introduces the numerical applications of GMCL. All proofs are provided in the Appendix. The related topics such as tail factor, curve fitting, diagnostics and validation of the main model hypothesis, are out of scope of this paper and will be treated elsewhere.

2. Notations and definitions

2.1. Run-off triangle

Let $C_{i, j}$ denote the random variables (cumulative payments, inccured, reported claims numbers, etc.) for accident year $i \in\{1, \ldots, I\}$ until development year $j \in\{1, \ldots, J\}$ , where the accident year is referred to as the year in which an event triggering insurance claims occurs. We assume that $C_{i, j}$ are random variables observable for calendar years $i+j \leq I+1$ and non-observable (to be predicted) for calendar years $i+j>I+1$ . The observable $C_{i, j}$ are represented by the so-called run-off trapezoids $(I>J)$ or run-off triangles $(I=J)$ . Table 1 gives an example of a typical run-off triangle. In order to simplify our notation, we assume that $I=J$ (run-off triangle). However, all the results we present here can be easily extended to the case when the last accident year for which data is available is greater than the last development year, i.e., $I>J$ (run-off trapezoid).

Table 1.Run-off triangle

$(I=J)$

2.2. Outstanding reserves

Let $R_{i}$ et $R$ denote the outstanding claims liabilities for accident year $i \in\{1, \ldots, I\}$ ,

$R_{i}=C_{i, I}-C_{i, I-i+1},\tag{2.1}$

and the total outstanding loss liabilities for all accident years,

$R=\sum_{i=1}^{I} R_{i}\tag{2.2}$

We use the term claims reserves to describe the prediction of the outstanding loss liabilities. Hence, let $\hat{R}_{i}$ and $R$ denote the claims reserves for accident year $i$ , $\hat{R}_{i}=\hat{C}_{i, I}-C_{i, L-i+1}, i \in\{1, \ldots, I\}$ , and the total claims reserves for aggregated accident years, $\hat{R}=\sum_{i=1}^{I} \hat{R}_{i}$ , respectively, where $\hat{C}_{i, I}$ is a predictor for $C_{i, I}$ .

2.3. (Conditional) mean square error of prediction (MSEP)

As already stated above, finding suitable prediction of ultimate loss is rather the beginning of the process of reserving, and insurers need to assess the variability of these amounts. We are interested then in the quantification of the prediction uncertainty of the ultimate loss, i.e., $\hat{C}_{i, I}$ and $\sum_{i=1}^{I} \hat{C}_{i, l}$ , (or equivalently of claims reserves, i.e., $\hat{R}_{i}$ and $\hat{R}=\sum_{i=1}^{I} \hat{R}_{i}$ ). For that, we have to choose an appropriate risk measure which determines a conception of measuring the “distance” between the prediction and the actual outcomes. In this paper, following the actuarial literature, we quantify the prediction uncertainty using the most popular such measure, the so-called mean-square error of prediction (MSEP).

$\operatorname{msep}_{\hat{C}_{i l} \mid D_{I}}\left(C_{i I}\right)=E\left[\left(\hat{C}_{i I}-C_{i I}\right)^{2} \mid D_{I}\right],\tag{2.3}$

$\operatorname{msep}_{\sum_{i=1}^{I} \hat{c}_{i l} \mid D_{l}}\left(\sum_{i=1}^{I} C_{i I}\right)=E\left[\left(\sum_{i=1}^{I} \hat{C}_{i I}-\sum_{i=1}^{I} C_{i I}\right)^{2} \mid D_{I}\right]\tag{2.4}$

where

$D_{I}=\left\{C_{i, j}: i+j \leq I+1\right\},\tag{2.5}$

denote the claims data available at time $t=I$ .

3. Mack chain-ladder (MCL) model

A major everyday challenge of actuarial work is selecting loss development factors for number of reasons (outliers in triangle, inaccurate data, incompleteness, etc). Most actuaries use somewhat arbitrary rules of thumb in selecting the loss ratios. In Blumsohn and Laufer (2009), a group of actuaries were asked to select age-to-age factors for a 12-years triangle of umbrella business. The important number of ways of selecting loss ratios was provided by the participants. It is important, then, from practical point of view to have a method that provides the estimation of conditional MSEP of ultimate claims in the context of factor selection of actuaries.

To the best of our knowledge, the paper by Mack (1999) is one of the first studies dealing with factors selection and variability of reserves estimation in the framework of the chain-ladder method. This paper is an extension of Mack (1993).

In the remaining part of this section we recall the assumptions of the MCL model from Mack (1999). Afterwards, we present the numerical example illustrating the limits of this approach. Finally, we indicate the possible expansion of MCL method and its potential applications.

3.1. Model assumptions of MCL method

Let define the individual development factors, for $1 \leq i \leq I-1$ and $1 \leq k \leq I-1$ ,

$F_{i, k}=C_{i, k+1} / C_{i, k} .\tag{3.1}$

Following Mack (1999), we assume [(MCL.1)]

There exist constants $f_{k}>0$ such that

$E\left(F_{i, k} \mid C_{i, 1}, \ldots, C_{i, k}\right)=f_{k}.$

The parameters $f_{k}$ are often called loss development factors (LDF), link ratios or age-to-age factors.
2. There exist constants $\sigma_{k}^{2}>0$ such that for all $1 \leq i \leq I$ and $1 \leq k \leq I-1$ we have

$\begin{gathered} \operatorname{Var}\left(F_{i, k} \mid C_{i, 1}, \ldots, C_{i, k}\right)=\frac{\sigma_{k}^{2}}{w_{i, k} C_{i, k}^{\alpha}}, \\ \text { with } w_{i, k} \in[0,1]. \end{gathered}\tag{3.2}$

The parameters $\sigma_{k}$ are referred here as variance parameters (LDF).
3. The accident years $\left(C_{i, 1}, \ldots, C_{i, 1}\right)_{1 \leq i \leq I}$ are independent.

3.2. Estimation of parameters in the MCL model

Given the information $D_{I}$ and for $1 \leq k \leq I-1$ , the factors $f_{k}$ are estimated by

$\hat{f}_{k}=\frac{\sum_{i=1}^{I-k} w_{i, k} C_{i, k}^{\alpha} F_{i, k}}{\sum_{i=1}^{I-k} \gamma_{i, k}}, \quad \alpha \in\{0,1,2\}.\tag{3.3}$

Given the information $D_{I}$ and for $1 \leq k \leq I-2$ , the variance parameters $\sigma_{k}^{2}$ are estimated by

$\hat{\sigma}_{k}^{2}=\frac{1}{I_{k}-1} \sum_{i=1}^{I-k} w_{i, k} C_{i, k}^{\alpha}\left(F_{i, k}-\hat{f}_{k}\right)^{2}, \quad \alpha \in\{0,1,2\},\tag{3.4}$

where $I_{k}$ represents the number of weights $w_{i, k}$ different from 0, namely, $I_{k}:=\operatorname{card}\left\{i: w_{i, k} \neq 0\right\}$ .

Formula (3.4) does not yield an estimator for $\hat{\sigma}_{I-1}^{2}$ because it is not possible to estimate this parameter from the single observation $C_{I, I} / C_{I, l-1}$ . Following Mack (1993, 1994, 1999), if $f_{I-1}=1$ and if the claims development is believed to be finished after $I-1$ years we can put $\hat{\sigma}_{I-1}^{2}=0$ . If not, the simple formula of extrapolation can be applied by requiring $\hat{\sigma}_{I-3} / \hat{\sigma}_{I-2}=\hat{\sigma}_{I-2} / \hat{\sigma}_{I-1}$ . This leads to the following definition

$\hat{\sigma}_{I-1}^{2}:=\min \left(\hat{\sigma}_{I-2}^{4} / \hat{\sigma}_{I-3}^{2}, \min \left(\hat{\sigma}_{I-3}^{2}, \hat{\sigma}_{I-2}^{2}\right)\right).\tag{3.5}$

Remark 3.1. The parameter $\alpha$ determines the different ways of estimation of $f_{k}$ . For the sake of simplicity, let us assume that $w_{i, j}=1$ for all $i, j$ . We present below the possible choices of $\alpha$ and their interpretation.

If we get the classical chain ladder estimate of
$\hat{f}_{k}=\frac{\sum_{i=1}^{I-k} C_{i, k} F_{i, k}}{\sum_{i=1}^{I-k} C_{i, k}}=\frac{\sum_{i=1}^{I-k} C_{i, k+1}}{\sum_{i=1}^{I-k} C_{i, k}}, \quad \text{for} 1 \leq k \leq I-1.$
If $\alpha=0$ we get the model for which the estimators of the age-to-age factors $f_{k}$ are the straightforward average of the observed individual development factors $F_{i, j}$ defined via (3.1), i.e.,

$\hat{f}_{k}=\frac{1}{I-k} \sum_{i=1}^{I-k} F_{i, k}, \quad \text { for } 1 \leq k \leq I-1.$

If $\alpha=2$ we get the model for which the estimators of the age-to-age factors $f_{k}$ are the results of an ordinary regression of $\left\{C_{i, k+1}\right\}_{i \in\{1, \ldots, I-k-1\}}$ against $\left\{C_{i, k}\right\}_{i \in\{1, \ldots, I-k\}}$ with intercept 0 , i.e.,

$\hat{f}_{k}=\frac{\sum_{i=1}^{I-k} C_{i, k}^{2} F_{i, k}}{\sum_{i=1}^{I-k} C_{i, k}^{2}}=\frac{\sum_{i=1}^{I-k} C_{i, k} C_{i, k+1}}{\sum_{i=0}^{I-k-1} C_{i, k}^{2}}, \quad \text { for } 0 \leq k \leq I-1.$

3.3. Properties of estimators from MCL model

Proposition 3.1
i. The estimators $\hat{f}_{k}$ given in (3.3) are unbiased and uncorrelated.
ii. The estimators $\hat{f}_{k}$ of $f_{k}$ have the minimal variance among all unbiased estimators of $f_{k}$ which are the weighted average of the observed development factors $F_{i, k}$ .
iii. The estimator $\hat{\sigma}_{k}^{2}$ , given in (3.4) is the unbiased estimator of the parameter $\sigma_{k}^{2}$ .
iv. Under the model assumptions (MCL.1) and (MCL.3) we have

$E\left(C_{i, I} \mid D_{I}\right)=C_{i, I+1-i} f_{i, I+1-i} \cdot \ldots \cdot f_{I-1}.$

This implies, together with the fact that $\hat{f}_{k}$ are uncorrelated, that $\hat{C}_{i, I}$ is unbiased estimator of $E\left(C_{i, I} \mid D_{I}\right)$ .
v. The expected values of the estimator

$\hat{C}_{i, I}=C_{i, I+1-i} \cdot \prod_{k=I+1-i}^{I-1} \hat{f}_{k},$

for the ultimate claims amount and of the true ultimate claims amount $C_{i, I}$ are equal, i.e., $E\left(\hat{C}_{i, I}\right)=$ $E\left(C_{i, I}\right), 2 \leq i \leq I$ .

The proof is provided in Appendix A.3.

3.4. Estimators of conditional MSEP in MCL model

3.4.1. Single accident years

Under assumptions of the MCL model we have the following estimator for the conditional estimation error of a single accident year $i \in\{2, \ldots, I\}$ :

$\widehat{m s e p}_{\hat{c}_{i_{i, l} \mid D_{l}}}\left(C_{i, I}\right) =\left(\hat{C}_{i, I}\right)^{2} \cdot \sum_{k=I-i+1}^{I-1} \frac{\hat{\sigma}_{k}^{2}}{\hat{f}_{k}^{2}}\left(\frac{1}{\hat{w}_{i, k} \hat{C}_{i, k}^{\alpha}}+\frac{1}{\sum_{j=1}^{I-k} w_{j, k} C_{j, k}^{\alpha}}\right), \tag{3.6}$

where, for $i+k>I+1$ , we define $\hat{w}_{i, k}:=1$ and $\hat{f}_{j}$ and $\hat{\sigma}_{j}^{2}$ are given in (3.3) and (3.4)-(3.5) respectively.

3.4.2. Aggregated accident years

$\begin{aligned} & \widehat{m s e p}_{\sum_{i=1}^{I} \hat{C}_{i, l} \mid D_{l}}\left(\sum_{i=1}^{I} C_{i, I}\right)=\sum_{i=2}^{I} \widehat{m s e p}_{\hat{C}_{i, l} \mid D_{l}}\left(C_{i, I}\right) \\ & \quad+\sum_{i=2}^{I} \hat{C}_{i, I}\left(\sum_{j=i+1}^{I} \hat{C}_{j, I}\right) \sum_{k=I-i+1}^{I-1} 2 \frac{\hat{\sigma}_{k}^{2} /\left(\hat{f}_{k}\right)^{2}}{\sum_{l=1}^{I-k} w_{l, k} C_{l, k}^{\alpha}}, \end{aligned}\tag{3.7}$

where $\hat{f}_{j}$ and $\hat{\sigma}_{j}^{2}$ are given in (3.3) and (3.4)-(3.5) respectively.

3.5. Numerical application of the MCL method

As mentioned above, the factors selection methods are an integral part of everyday actuarial practice.

Here we choose from Blumsohn and Laufer (2009) several such methods where estimates are computed as different averages using varying weights and varying number of accident years: all/3/5-years weighted average and all excluding higher and lower (AEHL) factor average. We consider as well other popular methods in actuarial practice based on sample median.

More precisely, for RAA run-off triangle (see Appendix B, Section B.8), we apply the MCL model from Mack (1999) with the following parameters. For all five methods described below we choose $\alpha=0\left(\hat{f}_{k}\right.$ arithmetic averages of $\left.F_{i, k}\right)$ and we compute the estimators $\hat{f}_{k}$ and $\hat{\sigma}_{k}^{2}$ according to formula (3.3) and (3.4)-(3.5), respectively. This allows us to compare the results with sample median method which is rather consistent with straightforward average of development factors (see method number (5) below)

ALL AV: $\hat{f}_{k}$ are computed as arithmetic average of all individual link ratios $F_{i, j}$ . More precisely, we define the weights in the following way: $w_{i, j}=1$ for all $i, j$ .
AEHL: $\hat{f}_{k}$ are computed as arithmetic average of all individual link ratios, excluding the highest and the lowest values of $F_{i, j}$ . More precisely, we define the weights in the following way: for fixed $j, w_{i, j}=0$ for $i$ such that $F_{i, j}=F_{(I-j), j}$ and $F_{i, j}=F_{(1), j}$ , where $F_{(k), j}$ for $k=1, \ldots, I-j$ denotes the order statistics of $F_{i, j}$ . For remaining indices $i$ , for fixed $j$ , we take $w_{i, j}=1$
5 Years AV: $\hat{f}_{k}$ are computed as an arithmetic average of individual link ratios $F_{i, j}$ from five latest accidents years. More precisely, we define the weights in the following way: $w_{i, j}=1$ for $i=I-j, \ldots, I-j-4$ . For remaining indices $i$ , for fixed $j$ , we take $w_{i, j}=0$
3 Years AV: $\hat{f}_{k}$ are computed as an arithmetic average of individual link ratios $F_{i, j}$ from three latest accidents years. More precisely, we define the weights in the following way: $w_{i, j}=1$ for $i=I-j, \ldots, I-j-2$ . For remaining indices $i$ , for fixed $j$ , we take $w_{i, j}=0$
Median: $\hat{f}_{k}$ are computed as an arithmetic average of individual link ratios $F_{i, j}$ in the way to obtain the sample median. More precisely, we put $w_{i, j}=1$ or $w_{i, j}=0$ in the way that the estimators of the age-to-age factors $f_{k}$ are given by

$\hat{f}_{k}=\operatorname{median}\left\{F_{i, k}: i \in\{1, \ldots, I-k\}\right\} .$

The median denotes the sample median that for the sample $X_{1}, \ldots, X_{n}$ is computed by

$\begin{gathered} \quad \operatorname{median}\left\{X_{i}: i \in\{1, \ldots, n\}\right\} \\ := \begin{cases}X_{\left(\frac{n+1}{2}\right)} & \text { if } n \text { is odd } \\ \frac{X_{\left(\frac{n}{2}\right)}+X_{\left(\frac{n}{2}+1\right)}}{2} & \text { otherwise }\end{cases} \end{gathered}.$

where $X_{(k)}$ denotes the $k-t h$ order statistics of the sample $X_{1}, \ldots, X_{n}$ .

Remark 3.2. In the case where there is only one observation in estimation of parameter $\sigma_{k}$ (odd number of data in sample median computation) we choose the additional $F_{i, k}$ factor in order to have two observations and be able to apply the formula (3.4).

In Table 2, we present the estimation of total amount of claims reserves $\hat{R}$ as well as the value of estimators of aggregated $\operatorname{MSEP}(\hat{R})$ . Recall that $\hat{R}:=\sum_{i=1}^{I} \hat{R}_{i}$ , where $\hat{R}_{i}:=\hat{C}_{i, I}-C_{i, L-i+1}$ . We observe that to obtain $\hat{R}$ it is enough to have the estimators $C_{i, I}$ of ultimate claims $\hat{C}_{i, I}$ for all accident year $i$ . In consequence $\operatorname{MSEP}(\hat{R})=$ $\operatorname{MSEP}\left(\sum_{i=1}^{I} \hat{C}_{i, I}\right)$ and we use the formula (3.7) to estimate this quantity. We compute as well the coefficient of variation of $\hat{R}$ , given by $C V(\hat{R})=\hat{R} / \operatorname{MSEP}(\hat{R})^{1 / 2}$ . The last two lines of Table 2 indicate the relative proportion of $\hat{R}$ and $\operatorname{MSEP}(\hat{R})^{1 / 2}$ , for each of five methods considered, in comparison to the ALL AV method which is the reference method in our example.

Table 2.Estimation of total amount of outstanding loss liabilities

$(\hat{R})$ , value of estimator of aggregated

$MSEP(\hat{R})^{1 / 2}$ and coefficient of variation

$\operatorname{CV}(\hat{R})$ , for five methods

alpha=0
Item/method	ALL AV (1)	AEHL (2)	5 Years AV (3)	3 Years AV (4)	Median (5)
$(\hat{R})$	93 643	65 868	75 886	68 645	54 059
$MSEP(\hat{R})^{1 / 2}$	92 549	21 015	27 486	29 493	14 786
$C V(\hat{R})$	99%	32%	36%	43%	27%
	(1)/(1)	(2)/(1)	(3)/(1)	(4)/(1)	(5)/(1)
$\pmb{\hat{R}(\%)}$	100%	70%	81%	73%	58%
$\pmb{MSEP(\hat{R})^{1 / 2}(\%)}$	100%	23%	30%	32%	16%

3.6. Limits of MCL method

As can be seen in Table 2, the four last methods (columns (2)-(5)) reduce significantly the estimation of $\operatorname{MSEP}(\hat{R})^{1 / 2}$ comparing to the first method ALL AV. For the methods (3), (4) and (5), this is mainly due to the elimination of relatively significant number of development factors from estimation especially for the first development years which correspond to the columns of the run-off triangle. This phenomena is especially seen in the case of sample median method in which, for each development factor we keep at most two of link ratios $F_{i, j}$ in estimation of $f_{k}$ . From statistical point of view, this is clearly not enough to perform the robust estimation. As a consequence, this kind of methods reduce unnaturally the variability of reserves. This could be dangerous for example in terms of evaluation of the economical capital for reserve risk required by the new Solvency II regime.

Beyond the limits stated above, there are some incoherences with application of weights $w_{i, k}$ for the $A E H L$ and sample median methods. Indeed, the weights $w_{i, k}$ should be $C_{i, k}$ measurable random variables in order to be able to derive the main results of MCL approach (see, for example, Proposition A.2). Although for the method 5 Year AV and 3 Years AV we can fix the weights without knowing the information $D_{I}$ (knowing all observation in the run-off triangle, see (5)), this is not a case for the AEHL and sample median methods. The reason is that we need to know the observation $F_{i, k}$ in order to specify the corresponding weights for those two methods. That is why the weights $w_{i, k}$ are not $C_{i, k}$ measurable but rather $D_{I}$ -measurable. This means that the formula for the expectation of $\hat{f}_{k}$ and $\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)$ are not correct. Regarding the sample median method, the derivation of $\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)$ requires the computation of the moments of order statistics (see Jeng 2010) and those are strongly related to the distribution of $F_{i, j}$ . To overcome these difficulties we propose two solutions: the simple Proxy method (see Section 6.1) and the more complex one based on a robust estimation (see Section 6.2). The first approach is programmed to avoid artificial volatility increase and it is based on all link ratios in estimation of volatility parameters $\sigma_{k}$ (scale parameters in linear regression). The second method consists of developing an approach that allows using any robust estimators of $f_{k}$ (location) and $\sigma_{k}$ (scale) parameters.

4. General Mack chain-ladder model

4.1. Model assumptions

Before stating the main assumptions of our general approach, let us assume that functions $g_{\delta, j}:[0, \infty) \rightarrow$ $[0, \infty)$ are Borel measurable. Let $\delta_{i, j}$ be the nonnegative random variables defined by, $\delta_{i, j}:=g_{\delta, j}\left(C_{i, j}\right)$ .

Our model is formalized by the following assumptions:
(GMCL.1) There exist constants $f_{k}>0$ such that

$E\left(F_{i, k} \mid C_{i, 1}, \ldots, C_{i, k}\right)=f_{k}.$

(GMCL.2) There exist constants $\sigma_{k}^{2}>0$ such that for all $1 \leq i \leq I$ and $1 \leq k \leq I-1$ we have

$\operatorname{Var}\left(F_{i, k} \mid C_{i, 1}, \ldots, C_{i, k}\right)=\left\{\begin{array}{llll} \frac{\sigma_{k}^{2}}{\delta_{i, k}} & \text { if } & \delta_{i, k} \neq 0 & \text { a.s., } \\ \infty & \text { if } & \delta_{i, k}=0 & \text { a.s., } \end{array}\right.$

where a.s. means almost surely.

(GMCL.3) The accident years $\left(C_{i, 1}, \ldots, C_{i, J}\right)_{1 \leq i \leq I}$ are independent.

We observe that from the above assumptions the main difference between MCL and GMCL lies in the variance assumption. This modification allows us to introduce different weights in estimation of the parameters $f_{k}$ and $\sigma_{k}$ .

4.2. Model estimators

Suppose that functions $g_{\gamma, j}:[0, \infty) \rightarrow[0, \infty)$ are Borel measurable. Let $\gamma_{i, j}$ be the non-negative random variables defined by, $\gamma_{i, j}:=g_{\gamma, j}\left(C_{i, j}\right)$ .

Given the information $D_{I}$ , the factors $f_{k}$ are estimated by

$\hat{f}_{k}=\frac{\sum_{i=1}^{I-k} \gamma_{i, k} F_{i, k}}{\sum_{i=1}^{I-k} \gamma_{i, k}}, \quad \text { for } 1 \leq k \leq I-1 \tag{4.2}$

It becomes obvious from assumption (GMCL.2) that in order to compute correctly the variance of $\hat{f}_{k}$ (see Proposition A. 2 in Appendix) we have to assume that

$\left\{\text { if } \delta_{i, j}=0 \quad \text { then } \gamma_{i, j}=0\right\}.\tag{4.3}$

Given the information $D_{I}$ , the variance parameters $\sigma_{k}^{2}$ are estimated by

$\hat{\sigma}_{k}^{2}=\frac{1}{I_{k}-1} \sum_{i=1}^{I-k} \delta_{i, k}\left(F_{i, k}-\hat{f}_{k}\right)^{2}, \quad \text { for } 1 \leq k \leq I-2, \tag{4.4}$

where $I_{k}$ represents the number of weights $\delta_{i, k}$ different from 0 , namely, $I_{k}:=\operatorname{card}\left\{i: \delta_{i, k} \neq 0\right\}$ .

In the analogue way to (3.4) we define

$\hat{\sigma}_{I-1}^{2}=\min \left(\hat{\sigma}_{I-2}^{4} / \hat{\sigma}_{I-3}^{2}, \min \left(\hat{\sigma}_{I-3}^{2}, \hat{\sigma}_{I-2}^{2}\right)\right). \tag{4.5}$

Proposition 4.1.

(i) The estimators $\hat{f}_{k}$ given in (4.2) are unbiased and uncorrelated.
(ii) For $k=1, \ldots, I-1$ , if $\delta_{i, k}=\gamma_{i, k}$ for all $i$ , then the estimators $\hat{f}_{k}$ of $f_{k}$ have the minimal variance among all unbiased estimators of $f_{k}$ which are the weighted average of the observed development factors $F_{i, k}$ .

For $k=1, \ldots, I-1$ , if $\delta_{i, k} \neq \gamma_{i, k}$ , for some $i$ , then the relative efficiency of s.e. $\left(\hat{f}_{k}^{\neq \delta} \mid B_{k}\right)$ with respect to s.e. $\left(\hat{f}_{k}^{\gamma=\delta} \mid B_{k}\right)$ , i.e., the ratio

$\frac{\text { s.e. }\left(\hat{f}_{k}^{\gamma=\delta} \mid B_{k}\right)}{\text { s.e. }\left(\hat{f}_{k}^{\gamma \neq \delta} \mid B_{k}\right)}:=\frac{\operatorname{Var}\left(\hat{f}_{k}^{\gamma \neq \delta} \mid B_{k}\right)^{1 / 2}}{\operatorname{Var}\left(\hat{f}_{k}^{\gamma=\delta} \mid B_{k}\right)^{1 / 2}}=\frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k \neq 0}\right\}}}{\sum_{j=1}^{I-k} \gamma_{j, k} \cdot \mathbf{1}_{\left\{\delta_{j, k \neq 0}\right\}}}$

(iii) For $k=1, \ldots, I-1$ , if $\delta_{i, j}=\gamma_{i, j}$ for all $i$ , then the estimator $\hat{\sigma}_{k}^{2}$ , given in (4.4) is the unbiased estimator of the parameter $\sigma_{k}^{2}$ .

For $k=1, \ldots, I-1$ , if $\delta_{i, k} \neq \gamma_{i, k}$ , for some $i$ , then the bias of the estimator $\hat{\sigma}_{k}^{2}$ is given by the following formula

$E\left[\hat{\boldsymbol{\sigma}}_{k}^{2}-\sigma_{k}^{2}\right]=\frac{\sigma_{k}^{2}}{I_{k}-1} E\left[\frac{\sum_{i=1}^{I-k} \delta_{i, k}\left(\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k \neq 0}\right\}}\right)}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}-1\right].$

(iv) Under the model assumptions (GMCL.1) and (GMCL.3) we have

$E\left(C_{i, l} \mid D_{I}\right)=C_{i, I+1-i} f_{i, I+1-i} \cdot \ldots \cdot f_{I-1}.$

This implies, together with the fact that $\hat{f}_{k}$ are uncorrelated, that $\hat{C}_{i, I}$ is unbiased estimator of $E\left(C_{i, l} \mid D_{I}\right)$ .

(v) The expected values of the estimator

$\hat{C}_{i, I}=C_{i, I+1-i} \cdot \prod_{k=I+1-i}^{I-1} \hat{f}_{k},$

for the ultimate claims amount and of the true ultimate claims amount $C_{i, I}$ are equal, i.e., $E\left(\hat{C}_{i, I}\right)=E\left(C_{i, I}\right), 2 \leq i \leq I$ .

The proof of this Proposition is provided in Appendix A. 4.

Remark 4.1.

If we set $\gamma_{i, j}=\delta_{i, j}=w_{i, j} C_{i, j}^{\alpha}$ , for $\alpha \in\{0,1,2\}$ , in (4.2) and (4.4) we get the assumptions of MCL model from Mack (1999) (see also Mack (1993), Mack (1994) and Saito 2009).
If we put $\gamma_{i, j}=\delta_{i, j}=w_{i, j} C_{i, j}^{\alpha_{j}}$ , for $\alpha_{j} \in \mathbb{R}$ , in (4.2) and (4.4) we get the stochastic chain-ladder model from Murphy, Bardis, and Majidi (2012).

5. Main results

5.1. Single accident years

Result 5.1 (Conditional MSEP estimator for a single accident year).

$\widehat{\operatorname{msep}}_{\hat{C}_{i, l} \mid D_{l}}\left(C_{i, I}\right)=\left(\hat{C}_{i, I}\right)^{2} \cdot\left(\hat{\Gamma}_{i, I}+\hat{\Delta}_{i, I}\right), \tag{5.1}$

where

$\hat{\Gamma}_{i, I}=\sum_{k=I-i+1}^{J-1} \frac{\hat{\sigma}_{k}^{2} /\left(\hat{f}_{k}\right)^{2}}{\hat{\delta}_{i, k}},\tag{5.2}$

$\hat{\Delta}_{i, I}=\sum_{k=I-i+1}^{J-1} \frac{\hat{\sigma}_{k}^{2} /\left(\hat{f}_{k}\right)^{2}}{\left.\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}} \cdot \sum_{j=1}^{I-k} \frac{\left(\gamma_{j, k}\right)^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}},\tag{5.3}$

and $\hat{f}_{j}$ and $\hat{\sigma}_{j}^{2}$ are given in (4.2) and (4.4)-(4.5), respectively.

5.2. Aggregation over prior accident year

Result 5.2 (Conditional MSEP estimator for aggregated years).

$\begin{aligned} & \widehat{m s e p}_{\sum_{i=1} \hat{C}_{i l} \mid D_{l}}\left(\sum_{i=1}^{I} C_{i, I}\right)=\sum_{i=2}^{I} \widehat{m s e p}_{\hat{C}_{i l} \mid D_{t}}\left(C_{i l}\right) \\ & \quad+\sum_{i=2}^{I} \hat{C}_{i, l}\left(\sum_{j=i+1}^{I} \hat{C}_{j, I}\right) \sum_{k=I-i+1}^{J-1} \frac{\hat{\sigma}_{k}^{2} /\left(\hat{f}_{k}\right)^{2}}{\left(\sum_{l=1}^{I-k} \gamma_{l, k}\right)^{2}} \\ & \quad \cdot \sum_{l=1}^{I-k} \frac{\left(\gamma_{l, k}\right)^{2}}{\delta_{l, k}} \cdot \mathbf{1}_{\left\{\delta_{\left.\delta_{l, k} \neq 0\right\}}\right.}, \end{aligned} \tag{5.4}$

where $\hat{f_{j}}$ and $\hat{\sigma}_{j}^{2}$ are defined in (4.2) and (4.4)-(4.5), respectively.

6. Applications of GMCL model

In our numerical example in Section 3.5 we have seen that the assumption about the same weights in estimation of parameters $\sigma_{k}$ and $f_{k}$ yields for some methods to an artificial reduction of variability of reserves amounts (refer to Table 2). To overcome this difficulty, we introduced the different weights $\gamma_{i, j}$ and $\delta_{i, j}$ in computation of $\hat{f}_{k}$ and $\hat{\sigma}_{k}$ , respectively. In the following application we indicate how one can possibly estimate the weights $\gamma_{i, j}$ and $\delta_{i, j}$ and we point out some other interesting applications.

6.1. Method proxy for factors selection

In this section, we examine our general framework $g_{\gamma, j}\left(C_{i, j}\right):=w_{i, j} C_{i, j}^{\alpha}$ and $g_{\delta, j} C_{i, j}:=w_{i, j}^{\delta} C_{i, j}^{\beta}$ which means that $\gamma_{i, j}:=w_{i, j}^{\gamma} C_{i, j}^{\alpha}$ and $\delta_{i, j}:=w_{i, j}^{\delta} C_{i, j}^{\beta}$ . In this so called proxy method we impose using all link ratios $F_{i, j}$ in estimation of parameters $\sigma_{k}\left(w_{i, j}^{\delta}=1\right.$ , for all $\left.i, j\right)$ . For all five methods presented, we take $\alpha=\beta=0$ . We turn back to our numerical example from Section 3.5 and we evaluate the same estimators for the alreadypresented five methods with the only difference in weights of $\sigma_{k}$ estimation. More precisely:

ALL AV : The $\hat{\sigma}_{k}^{2}$ are estimated with $w_{i, j}^{\delta}=1$ for all $i, j$ . Parameters $\hat{f}_{k}$ are computed as a arithmetic average of all individual link ratios $F_{i, j}$ . More precisely, we define the weights in the following way: $w_{i, j}^{\gamma}=1$ for all $i, j$ .
AEHL: The $\hat{\sigma}_{k}^{2}$ are estimated with $w_{i, j}^{\delta}=1$ for all $i, j$ . Parameters $\hat{f_{k}}$ are computed as an arithmetic average of all individual link ratios excluding the highest and the lowest values of $F_{i, j}$ . More precisely, we define the weights in the following way: for fixed $j, w_{i, j}^{\gamma}=0$ for $i$ such that $F_{i, j}=F_{(I-j), j}$ and $F_{i, j}=F_{(1), j}$ , where $F_{(k), j}$ for $k=1, \ldots, I-j$ denotes the order statistics of $F_{i, j}$ . For remaining indices $i$ , for fixed $j$ , we take $w_{i, j}^{\gamma}=1$ .
5 Years AV: The $\hat{\sigma}_{k}^{2}$ are estimated with $w_{i, j}^{\delta}=1$ for all $i, j$ . Parameters $\hat{f_{k}}$ are computed as an arithmetic average of individual link ratios $F_{i, j}$ from five latest accidents years. More precisely, we define the weights in the following way: $w_{i, j}^{\gamma}=1$ for $i=I-j, \ldots, I-j-4$ . For remaining indices $i$ , for fixed $j$ , we take $w_{i, j}^{\gamma}=0$
3 Years AV: The $\hat{\sigma}_{k}^{2}$ are estimated with $w_{i, j}^{\delta}=1$ for all $i, j$ . Parameters $\hat{f}_{k}$ are computed as an arithmetic average of individual link ratios $F_{i, j}$ from three latest accidents years. More precisely, we define the weights in the following way: $w_{i, j}^{\gamma}=1$ for $i=I-j, \ldots, I-j-2$ . For remaining indices $i$ , for fixed $j$ , we take $w_{i, j}^{\gamma}=0$
Median: The $\hat{\sigma}_{k}^{2}$ are estimated with $w_{i, j}^{\delta}=1$ for all $i, j$ . Parameters $\hat{f}_{k}$ are computed as an arithmetic average of individual link ratios $F_{i, j}$ in the way to obtain the sample median. More precisely, we put $w_{i, j}^{\gamma}=1$ or $w_{i, j}^{\gamma}=0$ in the way that the estimators of the age-to-age factors $f_{k}$ are given by

$\hat{f}_{k}=\operatorname{median}\left\{F_{i, k}: i \in\{1, \ldots, I-k\}\right\},$

where median denotes the sample median which, for the sample $X_{1}, \ldots, X_{n}$ , is computed by

$\begin{aligned} & \text { median }\left\{X_{i}: i \in\{1, \ldots, n\}\right\} \\ & := \begin{cases}X_{\left(\frac{n+1}{2}\right)} & \text { if } n \text { is odd } \\ \frac{X_{\left(\frac{n}{2}\right)}+X_{\left(\frac{n}{2}+1\right)}}{2} & \text { otherwise }\end{cases} \end{aligned}$

and $X_{(k)}$ denotes the $k t h$ order statistics of the sample $X_{1}, \ldots, X_{n}$ .

In Table 3 we present the estimation of $\hat{R}$ and $\operatorname{MSEP}(\hat{R})$ using the five methods described above. In terms of MSEP we see that, in general, we have the values greater than our reference method ALL AV from column (1), which stays unchanged compared to Table 2. This is not surprising because by selecting of the development factors we decreased the estimated values of $f_{k}$ and by using all observations $F_{i, j}$ in $\hat{\sigma}_{k}$ computation we mechanically increased the dispersion around the values of $\hat{f}_{k}$ . In view of our results from Tables 2 and 3, the proxy method overestimates in general the real MSEP, and can then be treated as its upper bound. However, it can be useful as a tool to perform the sensitivity analysis for testing the impact on the reserve volatility of excluding the specific set of link ratios. Finally, it can be seen as a measure of relative prudence of other approach of measuring the variability of reserves by means of MSEP estimators.

Table 3.Estimators of

$\hat{R}, \sqrt{MSEP(\hat{R})}$ and

$C V(\hat{R})$

alpha = 0
Item/method	ALL AV (1)	AEHL (2)	5 Years AV (3)	3 Years AV (4)	Median (5)
$(\hat{R})$	93 643	65 868	75 886	68 645	54 059
$MSEP(\hat{R})^{1 / 2}$	92 549	88 105	101 643	113 904	105 786
$C V(\hat{R})$	99%	134%	134%	166%	196%
	(1)/(1)	(2)/(1)	(3)/(1)	(4)/(1)	(5)/(1)
$\pmb{\hat{R}(\%)}$	100%	70%	81%	73%	58%
$\pmb{MSEP(\hat{R})^{1 / 2}(\%)}$	100%	95%	110%	123%	114%

6.2. Robust estimation in GMCL model

From the previous two numerical examples (MCL vs. GMCL results), we observe that, in general, the first approach underestimates and second overestimates the MSEP of claims reserves (see Tables 2 and 3). In this section we present an intermediate solution for our general problem that allows us to evaluate the estimation of MSEP of reserves in case of development factor selection. This go-between solution is based on the robust statistics in estimation of model parameters $f_{k}$ and $\sigma_{k}$ . The term robust statistics is meant in the sense of Huber and Ronchetti (2009).

As already mentioned, the assumption GMCL. 2 about the conditional variance of $F_{i, j}$ allows us to estimate the factors $f_{k}$ in the framework of linear regression obtained by the means of weighted least squares procedure (see Murphy, Bardis, and Majidi 2012 and the references therein). Although these estimators are easy to compute and have excellent theoretical properties (see Proposition 4.1), they rely on quite strict assumptions, and their violation may lead to useless results.

One possible solution to overcome this difficulty is to use robust estimation techniques. The idea of robust statistics is to account for certain deviations from idealized model assumptions. Typically, robust methods reduce the influence of outlying observation on the estimator.

We take the following assumption in our general framework of GMCL model: $g_{\gamma j,}(t):=t^{\alpha_{j}}$ and $g_{\delta, j}(t):=$ $t^{\beta_{j}}$ , with $\alpha_{j}, \beta_{j} \in \mathbb{R}$ to be estimated. This means that $\gamma_{i, j}:=C_{i, j}^{\alpha_{j}}$ and $\delta_{i, j}:=C_{i, j}^{\beta_{j}}$ .

The following algorithm shows how one can estimate the parameters $\alpha_{j}$ for $j=1, \ldots, I-1$ and $\beta_{k}$ for $k=1, \ldots, I-2$ . As can be seen, the presented method is based on a similar principle to the well known moment estimation method from point estimation theory.

6.2.1. Algorithm for fitting $\alpha_{k}$ and $\beta_{k}$ parameters

Step 1. We select the robust estimators for $f_{k}$ and its variance. We denote these estimators by $\tilde{f}_{k}$ and $\tilde{\operatorname{Var}}\left(\tilde{f}_{k}\right)$ respectively. These two quantities can be derived by numerous techniques described in the literature, such as: M-estimation, $L^{p}$ estimation, etc. (Huber and Ronchetti 2009) or trimmed mean (Jeng 2010).

Step 2. For every $k=1, \ldots, I-1$ , we find $\alpha_{k}$ by solving the following equation $\hat{f}_{k}=\tilde{f}_{k}$ , where $\hat{f}_{k}$ is given in equation (4.2), namely

$\frac{\sum_{i=1}^{I-k} C_{i, k}^{\alpha_{k}} F_{i, k}}{\sum_{i=1}^{I-k} C_{i, k}^{\alpha_{i, k}}}=\tilde{f}_{k} .\tag{6.1}$

The procedure to select the consistent $\alpha_{k}$ together with the problem of existence of solution of equation (6.1) is treated in Murphy, Bardis, and Majidi (2012) (see Lemma 1 and the comments that follow it).

Step 3. For every $k=1, \ldots, I-2$ , we find $\beta_{k}$ by solving following equation: $\hat{\operatorname{Var}}\left(\hat{f}_{k}\right)=\operatorname{Var}\left(\tilde{f}_{k}\right)$ , where $\hat{\operatorname{Var}}\left(\hat{f}_{k}\right)$ is given in (A.8), namely,

$\frac{1}{I-k-1} \sum_{i=1}^{I-k} C_{i, k}^{\beta_{k}}\left(F_{i, k}-\hat{f}_{k}\right)^{2} \cdot \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{C_{i, k}^{\beta_{k}}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}=\tilde{\operatorname{Var}}\left(\tilde{f}_{k}\right)\tag{6.2}$

The parameters $\hat{f}_{k}$ are given in equation (4.2) with $\alpha_{k}$ estimated in Step 2. For $k=I-1$ , since only one observation is available in our data, $\beta_{I-1}$ need to be estimated by other approaches. The limits for the values of $\tilde{\operatorname{Var}}\left(\tilde{f}_{k}\right)$ for which the solution of equation (6.2) exists are presented in Appendix E.

6.2.2. Numerical example

In the present example we consider only the median method already presented in previous numerical applications. We concentrate on that particular method because our goal is not to present the extensive case study but rather to illustrate the general principle and the main steps of this application. Observe that the method AEHL can be treated by the theory of robust estimation by means of trimmed mean estimators (Jeng 2010).

To apply the above fitting algorithm for sample median method, we use the LAD (least absolute deviation) estimation procedure. The theoretical framework of LAD is presented in Appendix D. The values of $\tilde{f}_{k}$ are given by computing the sample median from $F_{i, k}$ as described in Section 3.5. The standard errors of $\tilde{f}_{k}$ are obtained via bootstrap techniques. In Table 4 we present the numerical values of $\tilde{f}_{k}$ , s.e $\left(\tilde{f}_{k}\right):=\operatorname{Var}\left(\tilde{f}_{k}\right)^{1 / 2}$ and $C V\left(\tilde{f}_{k}\right):=\operatorname{s.e}\left(\tilde{f}_{k}\right) / \tilde{f}_{k}$ . Note that the last value of $s . e\left(\tilde{f}_{k}\right)$ cannot be estimated from the data for the reasons discussed in the case of $\sigma_{I-1}$ estimation in Section 3.2 (only one observation available). This why we do not fit the parameter $\beta_{I-1}$ via equation (6.2), but we put $\beta_{I-1}:=\alpha_{I-1}$ .

Table 4.Estimation of

$\tilde{f}_{k}$ and s.e(

$\left.\tilde{f}_{k}\right)$ using LAD technique corresponding to sample median method

k	1	2	3	4	5	6	7	8	9
$\tilde{f}_k$	4,2597	1,5992	1,1635	1,1657	1,1318	1,0335	1,0333	1,0180	1,0092
$\operatorname{s.e}\left(\tilde{f}_k\right)$	1,7974	0,1686	0,1411	0,0270	0,0472	0,0251	0,0065	0,0129	—
$\operatorname{CV}\left(\tilde{f}_k\right)$	42,2%	10,5%	12,1%	2,3%	4,2%	2,4%	0,6%	1,3%	—

The standard deviations of $\tilde{f}_{k}$ from Table 4 were obtained by using the rq function integrated in free R software. Note that the values of $\tilde{f}_{k}$ given by this function are slightly different from those presented in Section 3.2. This is probably due to the optimization algorithm that is used in R. Given that these differences are insignificant, we decided to present in Table 4 the same numerical values of the estimators $\tilde{f}_{k}$ as given in Section 3.2. The corresponding R code is available by request from the author.

As mentioned in the algorithm and in Appendix E, the solutions of (6.1) and (6.2) are not always available. For instance, in our example, there is no solution of equation (6.1) for $k=3,4$ and no solution of equation (6.2) for $k=8$ . This means that for $k=3$ , $k=4$ , and $k=8$ the parameters $\alpha_{k}$ and $\beta_{k}$ need to be specified in a different way. This could be done using any other approach that is being judged appropriate by the actuary performing estimation. In our case, we put $\alpha_{3}=$ $\beta_{3}=$ $\alpha_{6}=$ $\beta_{6}=$ $\alpha_{8}=$ $\beta_{8}=0$ to have from one hand the optimal properties (see Proposition 4.1) but also to be consistent with our choice of $\alpha=0$ in our two previous numerical applications (see Sections 3.5 and 6.1).

The estimation of parameters $\alpha_{j}$ and $\beta_{j}$ are stated in Table 5. The values of $\hat{\alpha}_{j}$ and $\hat{\beta}_{j}$ for which we arbitrarily put 0 are indicated with bold font characters.

Table 5.Estimation of parameters

$\hat{\alpha}_{j}$ and

$\hat{\beta}_{j}$

j	1	2	3	4	5	6	7	8	9
$\hat{\alpha}_j$	0,5204	1,4073	0	1,5852	−0,3835	0	1,0022	0	0,0000
$\hat{\beta}_j$	0,6605	0,3120	0	0,7207	1,9501	0	−2,7733	0	—

The MSEP and claims reserves amount estimators are stated in Table 6. Observe that the robust estimation is a good compromise between the method with the same weights (see Section 3.5) and the method where we use all link ratios in $\sigma_{k}$ estimation (see proxy method in Section 6.1).

Table 6.Median method with robust estimation

Median
Item/method	MCL	Robust	Proxy
$\hat{R}$	54 059	63 165	54 059
$\pmb{MSEP(\hat{A})^{1 / 2}}$	14 786	40 312	105 786
$\pmb{C V(\hat{R})}$	27%	64%	196%

6.3. Validation of results from reserving softwares

The next interesting and extremely important application of our GMCL model is the possibility of validating the results from industry reserving software. The stochastic chain-ladder type methods are used to evaluate the economic risk capital required by Solvency II for so-called reserve risk. In fact, this capital requirement for reserve risk is computed as the 99.5 th percentile (value at risk) of run-off result distribution (profit/loss on reserves over one year). This means that Solvency II defines the reserve risk in one-year time horizon, which is different from the standard approach considering the distribution of the ultimate cost of claims.

However, one of the methods to derive the one-year reserve risk is based on simple scaling of ultimate view. This technique is based on using the results of Merz and Wüthrich (2008), which is currently a popular methodology throughout the market and taken from the latest technical literature on this topic.

The empirical loss distribution in ultimate view is often derived by using the bootstrap techniques and Monte Carlo simulations. The first technique is used to evaluate the estimation error and the second to approximate the process variance. This kind of bootstrap approach is also available in ResQ software, which is used worldwide within the property and casualty insurance market. The question is how to validate the results from bootstrap method provided by reserving tools such as ResQ. One of the possible solutions is to compare the estimation of the first two moments of loss distribution from bootstrapping (based on simulations) with the estimators of reserves and MSEP of reserves obtained by the explicit formulas. For the sake of simplicity, we assume that there is no factors selection (all weights $w_{i, j}$ are fixed to 1 ). We use the RAA run-off triangle and we present the numerical results in Table 7. For all bootstrapping results we used 100,000 simulations. We begin our analysis with the classic chain-ladder method in which the estimators of $f_{k}$ are the all volume weighted average and are consistent with Mack (1993). More precisely, with the hypothesis of the MCL method with $\alpha=1$ , we compare the estimate of MSEP obtained by these two techniques: bootstrap from ResQ and explicit formula given in MCL approach. The corresponding numerical values are respectively: 27150 (see ResQ(Boot) (3) in Table 7) and 26909 (see ResQ(MCL) (4) in Table 7). We observe a good convergence for bootstrap (the relative error is less than $1 \%$ ). We consider now the different estimator of $f_{k}$ computed as a simple arithmetic average of individual link ratios $F_{i, j}$ . This is equivalent to taking $\alpha=0$ in the MCL framework. In that case, we observe that the estimates of MSEP for both methods become divergent: 75656 (see ResQ(Mack) (2) in Table 7) and 58475 (see ResQ(Boot) (1) in Table 7). This is due to the fact that the $\operatorname{ResQ}$ (Mack) method is obtained by approximation based on the MCL formula with $\alpha=1$ . In fact, according to the technical documentation, the ResQ estimates of parameters $f_{k}$ and $\sigma_{k}$ in the bootstrap approach are of the form (up to multiplicative constant for bias reduction): $\hat{f}_{k}=\frac{1}{I-k} \sum_{i=1}^{I-k} F_{i, k}$ and $\sigma_{k}^{2}=\frac{1}{I-k-1} \sum_{i=1}^{I-k} C_{i, k}\left(F_{i, k}-\hat{f}_{k}\right)^{2}$ . It is easily seen that these estimators are consistent with our general approach with $\alpha=0$ and $\beta=1$ (see (4.2) and (4.4) in Section 4). The MSEP estimator is equal to 59065 (see GMCL (5) in Table 7). This shows that GMCL method allows one to validate the results and detect the incoherences. Effectively, the choice of estimators in ResQ for the case $\alpha=0$ is not optimal in sense of Proposition 4.1. It remains unknown whether this is deliberate or whether this is just a proxy approach that was judged correct.

Table 7.Comparison of ResQ estimators of

$\hat{R}$ and

$\operatorname{MSEP}(\hat{R})$ with MCL and GMCL models

Item/method	alpha = 0		alpha = 1		alpha = 0, beta = 1
Item/method	ResQ(Boot) (1)	ResQ(Mack) (2)	ResQ(Boot) (3)	MCL (4)	GMCL (5)
$\hat{R}$	93 630	93 643	52 204	52 135	93 643
$MSEP(\hat{X})^{1 / 2}$	58 475	75 656	27 150	26 909	59 065

In regards to the approximation $\operatorname{ResQ(Mack)~(2),~}$ this shows that in construction of the proxy methods we cannot just take the MSEP formula for $\alpha=1$ as a starting point. Indeed, the MSEP formula changes if we modify the estimates of $f_{k}$ because the variance of $f_{k}$ is not the same, so it is not enough to plug in the new estimators of $C_{i, 1}, f_{k}$ and $\sigma_{k}$ in the MSEP formula (5.4) with $\alpha=\beta=1$ . This lack of understanding of this principle could be a reason of taking the no optimal hypothesis in bootstrap ResQ(Boot) (1) method.

Finally, we observe that the results of $\operatorname{Res} Q$ (Boot) (1) method validate our explicit formula for estimation of MSEP of claims reserves in the framework of our GMCL model.

7. Conclusion

In this paper we presented a general flexible tool for stochastic loss reserving and its variability. We developed our GMCL model to quantify the variability of reserves in the context of selecting development factors in the framework of the stochastic chain-ladder method.

We provided the theoretical and flexible background which covers some practices of actuaries and industrial providers of reserving softwares.

Finally, we showed the way of bridging the chain-ladder model and the robust estimation techniques. Our results can be applied in other approaches based on chain-ladder framework like: multivariate chainladder, univariate and multivariate Bayesian chainladder, etc. One can derive the similar results in the context of one-year reserve risk for Solvency II purposes. This topic will be treated in our forthcoming paper. Some partial results can be found in Sloma (2014) and Sloma (2011).

Acknowledgments

The author thanks the reviewers for their helpful comments and suggestions.

Appendix: Mathematical Proofs

We present here the proofs of our main results. Most of them are derived by simple rewriting the techniques applied in Mack (1993), Mack (1994).

A.1. Proof of Result 5.1

Due to the general rule $E(X-c)^{2}=\operatorname{Var}(X)+(E X-c)^{2}$ for any scalar c we have

$\begin{aligned} \operatorname{msep}_{\hat{C}_{i l} \mid D_{l}}\left(C_{i l}\right) & =E\left[\left(\hat{C}_{i l}-C_{i l}\right)^{2} \mid D_{I}\right] \\ & =\operatorname{Var}\left(C_{i l} \mid D_{I}\right)+\left(E\left(C_{i l} \mid D_{I}\right)-\hat{C}_{i l}\right)^{2} \end{aligned}\tag{A.1}$

To estimate $\operatorname{Var}\left(C_{i, I} \mid D_{I}\right)$ we use the following
Lemma 9.1. For $i=2, \ldots, I$ , we have,

$\operatorname{Var}\left(C_{i, I} \mid D_{I}\right)=\sum_{l=I+1-i}^{I-1} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right] \sigma_{l}^{2} \prod_{k=l+1}^{I-1} f_{k}^{2} .\tag{A.2}$

The proof of Lemma A. 1 is provided in Appendix A.5.

Note that the estimation of $E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right]$ from equation (A.2) is a crucial part of this proof. We choose to estimate this term by $\frac{\hat{C}_{i, l}^{2}}{\hat{\delta}_{i, l}}$ . This is due to the obvious observation that $\frac{C_{i, l}^{2}}{\delta_{i, l}}$ is an unbiased estimate of $E\left[\frac{C_{i, l}^{2}}{\delta_{i, l}}\right]$ and from the basic property of conditional expectation, namely: $E\left[E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right]\right]=E\left[\frac{C_{i, l}^{2}}{\delta_{i, l}}\right]$ .

It is worth noting here that, in the case where $\delta_{i, l}:=\mathrm{C}_{i, l}^{\alpha}$ with $\alpha \in \mathbb{R}$ , in Saito (2009), the author used the same technique of estimation without giving any justification or reason for that (see proof of Lemma 4 and Estimate 8). Similarly, in the case where $\delta_{i, l}:=w_{i, l} \cdot C_{i, l}^{\alpha}$ with $\alpha \in\{0,1,2\}$ and $w_{i, l} \in[0,1]$ , we find the same estimator in Mack (1999). More precisely, the author claims (without proving) that
$\sum_{l=l+l-i}^{L-1} \operatorname{Var}\left(C_{i, l+1} \mid D_{l}\right) \prod_{k=l+1}^{L-1} f_{k}^{2}$ can be estimated via the quantity $\hat{C}_{i, l} \sum_{l=l+1-i}^{L-1}\left(\text { s.e. }\left(F_{i, l}\right)\right)^{2} / \hat{f}_{l}^{2}$ , where $\left(\text { s.e. }\left(F_{i, l}\right)\right)^{2}$ is an estimate of $\operatorname{Var}\left(F_{i, 1} \mid C_{i, 1}, \ldots, C_{i, 1}\right)$ . Indeed, this is achieved if we estimate $E\left[\left.\frac{C_{i, l}^{2}}{w_{i, l} \cdot C_{i, l}^{\alpha}} \right\rvert\, D_{l}\right]$ by $\frac{\hat{C}_{i, l}^{2}}{w_{i, l} \cdot \hat{C}_{i, l}^{\alpha}}$ .

However, in Murphy, Bardis, and Majidi (2012), the authors used different approach based on normal approximation.

Note that in the Section 6.3 we obtained that the above estimator of $\operatorname{Var}\left(C_{i, 1+1} \mid D_{j}\right)$ is consistent with that provided by bootstrap technique from ResQ software. This is shown for the particular run-off triangle and the assumption that $C_{i, j}$ are gamma-distributed random variables. It would be interesting to perform the extensive simulation study in order to examine the exactitude of this estimate with other data and probability distributions.

We apply now Lemma A. 1 with $\hat{E}\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{l}\right]=\frac{\hat{C}_{i, l}^{2}}{\hat{\delta}_{j, l}}$ and by replacing the unknown parameters $f_{k}$ et $\sigma_{k}^{2}$ with their estimators $\hat{f}_{k}$ and $\hat{\sigma}_{k}^{2}$ . Together with the equality $\hat{C}_{i, l}=C_{l+1-i} \Pi_{k=I+1-i}^{L-1} \hat{f}_{k}$ (see Proposition 4.1 (v)) we conclude

$\begin{aligned} \operatorname{Var}\left(C_{i, l} \mid D_{I}\right) & =\sum_{l=I+1-i}^{I-1} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right] \sigma_{l}^{2} \prod_{k=l+1}^{I-1} f_{k}^{2} \\ & =\sum_{l=I+1-i}^{I-1} \frac{\hat{C}_{i, l}^{2}}{\hat{\delta}_{i, l}} \hat{\sigma}_{l}^{2} \prod_{k=l+1}^{I-1} \hat{f}_{k}^{2} \\ & =\sum_{l=I+1-i}^{I-1} \frac{\hat{\sigma}_{l}^{2}}{\hat{\delta}_{i, l}} C_{i, l+1-i}^{2} \prod_{k=I+1-i}^{l-1} \hat{f}_{k}^{2} \cdot \prod_{k=l+1}^{I-1} \hat{f}_{k}^{2} \\ & =C_{i, l+1-i}^{2} \sum_{l=I+1-i}^{I-1} \frac{\hat{\sigma}_{l}^{2} / \hat{f}_{l}^{2}}{\hat{\delta}_{i, l}} \prod_{k=l+1-i}^{I-1} \hat{f}_{k}^{2} \\ & =C_{i, l+1-i}^{2} \cdot \prod_{k=I+1-i}^{I-1} \hat{f}_{k}^{2} \sum_{l=l+1-i}^{I-1} \frac{\hat{\sigma}_{l}^{2} / \hat{f}_{l}^{2}}{\hat{\delta}_{i, l}} \\ & =\hat{C}_{i, l}^{2} \sum_{l=I+1-i}^{I-1} \frac{\hat{\sigma}_{l}^{2} / \hat{f}_{l}^{2}}{\hat{\delta}_{i, l}} . \end{aligned}\tag{A.3}$

We now turn to the second summand of the expression (A.1). Because of Proposition 4.1 (iv) and (v) we have,

$\begin{aligned} & \left(E\left(C_{i, l} \mid D_{I}\right)-\hat{C}_{i, I}\right)^{2} \\ & \quad=C_{i, l+1-i}^{2}\left(f_{I+1-i} \cdot \ldots \cdot f_{I-1}-\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1}\right)^{2} . \end{aligned}\tag{A.4}$

As can be easily seen, this expression cannot be estimated by replacing $f_{k}$ with $\hat{f}_{k}$ . In order to estimate the right hand side of (A.4) we use the same approach as in Mack (1993), Mack (1994). Saito (2009) followed the same technique of estimation. However, in Murphy, Bardis, and Majidi (2012) we can find a different approach which was also presented in Buchwalder et al. (2006a). It is worth noting that in the paper of Mack, Quarg, and Braun (2006) the authors criticised the approaches of Buchwalder et al. (2006a) and showed that the estimate of estimation error from Mack (1993) is hard to be improved (see also Buchwalder et al. 2006b). As the answer for the criticism of Mack on article of Buchwalder et al. (2006a), the authors provided the bounds for estimation error and claimed that the Mack estimator, in some particular cases, is closed to these bounds (see Wüthrich, Merz, and Bühlmann 2008). This should be confirmed by performing the extensive simulation study to quantify the different approaches of error estimation in stochastic chain-ladder framework.

We define,

$\begin{aligned} F & =f_{l+1-i} \cdot \ldots \cdot f_{I-1}-\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1} \\ & =S_{I+1-i}+\ldots+S_{I-1}, \end{aligned}\tag{A.5}$

with

$\begin{aligned} S_{k}= & \hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{k-1} f_{k} f_{k+1} \cdot \ldots \cdot f_{I-1} \\ & -\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{k-1} \hat{f}_{k} f_{k+1} \cdot \ldots \cdot f_{I-1} \\ = & \hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{k-1}\left(f_{k}-\hat{f}_{k}\right) f_{k+1} \cdot \ldots \cdot f_{I-1} . \end{aligned}\tag{A.6}$

This yields

$\begin{aligned} F^{2} & =\left(S_{I+1-i}+\ldots+S_{I-1}\right)^{2} \\ & =\sum_{k=I+1-i}^{I-1} S_{k}^{2}+2 \sum_{k=I+1-i}^{I-1} \sum_{j<k}^{I-1} S_{j} S_{k} . \end{aligned}\tag{A.7}$

We estimate $F^{2}$ using the following

Proposition A. 1 (Estimate of $\boldsymbol{F}^{\mathbf{2}}$ ) Let define, for $1 \leq k \leq I$ , the set of observed $C_{i, j}$ up to development year $k$ , namely

$B_{k}=\left\{C_{i, j}: i+j \leq I+1, k \leq j\right\} \subset D_{I} .$

Then, we can estimate $F^{2}$ by

$\widehat{F^{2}}=\prod_{l=I+1-i}^{I-1} \hat{f}_{l}^{2} \sum_{k=I+1-i}^{I-1} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}}.$

The proof of Proposition A. 1 is provided in Appendix A.5.

It remains to determine the estimate of $\operatorname{Var}\left(\hat{( }_{k} \mid B_{k}\right)$ . We use the following

Proposition A. 2 We assume (4.3). We have

$\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)=\sigma_{k}^{2} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}} .\tag{A.8}$

The proof of Proposition A. 2 is provided in Appendix A.5.

Finally, using (A.4) and Proposition A. 2 we estimate $\left.E\left(C_{i, I} \mid D_{I}\right)-\hat{C}_{i, I}\right)^{2}$ by

$\begin{aligned} & C_{i, I+1-i}^{2} \hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{I-1}^{2} \sum_{k=I+1-i}^{I-1} \frac{\hat{\boldsymbol{\sigma}}_{k}^{2}}{\hat{f}_{k}^{2}} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}} \\ & \quad=\hat{C}_{i, I}^{2} \sum_{k=I+1-i}^{I-1} \frac{\hat{\sigma}_{k}^{2}}{\hat{f}_{k}^{2}} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}. \end{aligned}$

This completes the proof of Result 5.1.

A.2. Proof of result 5.2 (overall standard error)

Following the definition in (2.4), we have

$\begin{aligned} & \operatorname{msep}_{\sum_{i=1}^{I} \hat{C}_{i, I} \mid D_{I}}\left(\sum_{i=1}^{I} C_{i, I}\right) \\ & =E\left[\left(\sum_{i=1}^{I} \hat{C}_{i I}-\sum_{i=1}^{I} C_{i I}\right)^{2} \mid D_{I}\right] \\ & \quad=\operatorname{Var}\left(\sum_{i=1}^{I} C_{i, I} \mid D_{I}\right)+\left(E\left(\sum_{i=1}^{I} C_{i, I} \mid D_{I}\right)-\sum_{i=1}^{I} \hat{C}_{i, I}\right)^{2} \end{aligned}$

The independence of accident years yields

$\operatorname{Var}\left(\sum_{i=1}^{I} C_{i, I} \mid D_{I}\right)=\sum_{i=1}^{I} \operatorname{Var}\left(C_{i, I} \mid D_{I}\right) .$

where each term of the sum has already been calculated in the proof of the Result 5.1.

Furthermore

$\begin{aligned} & \left(E\left(\sum_{i=1}^{I} C_{i, I} \mid D_{I}\right)-\sum_{i=1}^{I} \hat{C}_{i, I}\right)^{2} \\ & \quad=\left(\sum_{i=1}^{I}\left(E\left(C_{i, I} \mid D_{I}\right)-\hat{C}_{i, I}\right)\right)^{2} \\ & \quad=\sum_{i, j}^{I}\left(E\left(C_{i, I} \mid D_{I}\right)-\hat{C}_{i, I}\right)\left(E\left(C_{j, I} \mid D_{I}\right)-\hat{C}_{j, I}\right) \end{aligned}$

Taking together

$\begin{aligned} & \widehat{\operatorname{msep}} \sum_{i=1}^{I} \hat{c}_{i l} \mid D_{l} \\ &\left(\sum_{i=1}^{I} C_{i, I}\right)= \sum_{i=2}^{I} \widehat{\operatorname{msep}}_{\hat{C}_{i l} \mid D_{l}}\left(C_{i I}\right) \\ &+\sum_{2 \leq i \leq j \leq I}^{I} 2 \cdot C_{i, I+1-i} C_{j, I+1-j} F_{i} F_{j}, \end{aligned}$

with

$F_{i}=f_{I+1-i} \cdot \ldots \cdot f_{I-1}-\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1}=\sum_{k=I+1-i}^{I-1} S_{k}^{i},$

where

$S_{k}^{i}=\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{k-1}\left(f_{k}-\hat{f}_{k}\right) f_{k+1} \cdot \ldots \cdot f_{I-1}.$

We can determine the estimator of $F_{i} F_{j}$ in the analogous way as for $F^{2}$ .

Proposition A.3. We have

$\begin{aligned} \widehat{F_{i} F_{j}}= & \sum_{k=I+1-i}^{I-1} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}}\left(\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1}\right) \\ & \cdot\left(\hat{f}_{I+1-j} \cdot \ldots \cdot \hat{f}_{I-1}\right) . \end{aligned}$

We finally conclude, from Proposition A. 3

$\begin{aligned} & \sum_{2 \leq i<j \leq I}^{I} 2 \cdot C_{i, I+1-i} C_{j, I+1-j} \widehat{F_{i} F_{j}} \\ & =\sum_{i=2}^{I} \hat{C}_{i, I} \sum_{j=i+1}^{I} \hat{C}_{j, I} \sum_{k=I-i+1}^{I-1} 2 \frac{\hat{\sigma}_{k}^{2} / \hat{f}_{k}^{2}}{\left(\sum_{l=1}^{I-k} \gamma_{l, k}\right)^{2}} \cdot \sum_{l=1}^{I-k} \frac{\gamma_{l, k}^{2} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\delta_{l, k}} . \end{aligned}$

A.3. Proof of Proposition 3.1

i. See Theorem 2 p. 215 in Mack (1993).
ii. See discussion on p. 112, Corollary on p. 141 and Appendix B on p. 140 in Mack (1994).
iii. See Appendix E on p. 151 in Mack (1994).
iv. See Theorem 1 p. 215 and discussion after the proof of Theorem 2 on page 216 in Mack (1993).
v. see Appendix C p. 142 in Mack (1994):

A.4. Proof of Proposition 4.1

(i), (iv) and (v), see proofs of (i), (iv) and (v) respectively in Proposition 3.1.
(ii). The first part of the statement regarding to the minimal variance of parameters $f_{k}$ can be easily derived from the proof of (ii) in Proposition 3.1. The rest of the proof is easily seen from the Proposition A.2.
(iii). Without loss of generality and to avoid the complexity of notation we present the proof for $I_{k}=I-k$ (for each $k$ , all weights $\delta_{i, k}$ are different from 0 ).

We have, for $1 \leq k \leq I-2$ ,

$\begin{aligned} & (I-k-1) \cdot \hat{\sigma}_{k}^{2}=\sum_{i=1}^{I-k} \delta_{i, k}\left(F_{i, k}-\hat{f}_{k}\right)^{2} \\ & \quad=\sum_{i=1}^{I-k} \delta_{i, k} F_{i, k}^{2}-2 \sum_{i=1}^{I-k} \delta_{i, k} F_{i, k} \cdot \hat{f}_{k}+\sum_{i=1}^{I-k} \delta_{i, k} \hat{f}_{k}^{2} . \end{aligned}$

Since $\delta_{i, k}$ are $\sigma\left(C_{i, k}\right)$ measurable, we have

$\begin{aligned} E\left((I-k-1) \cdot \hat{\sigma}_{k}^{2} \mid B_{k}\right)= & \sum_{i=1}^{I-k} \delta_{i, k} E\left(F_{i, k}^{2} \mid B_{k}\right) \\ & -2 \sum_{i=1}^{I-k} \delta_{i, k} E\left(F_{i, k} \cdot \hat{f}_{k} \mid B_{k}\right) \\ & +\sum_{i=1}^{I-k} \delta_{i, k} E\left(\hat{f}_{k}^{2} \mid B_{k}\right). \end{aligned}$

In the following derivation we use $\sigma\left(C_{i, k}\right)$ measurability of $\gamma_{i, k}$ and definition of $\hat{f}_{k}$ from (4.2). Furthermore, the assumption GMCL. 3 implies that $F_{i, k}$ and $F_{j, k}$ are independent for $i \neq j$ . From assumption GMCL. 1 and GMCL. 2 we easily see that $E\left(F_{i, k}^{2} \mid B_{k}\right)=$ $\frac{\sigma_{k}^{2}}{\delta_{i, k}}+f_{k}^{2}$ . Taking together,

$\small{ \begin{aligned} E\left(F_{i, k} \cdot \hat{f}_{k} \mid B_{k}\right)= & \frac{1}{\sum_{l=1}^{I-k} \gamma_{l, k}}\left(\sum_{j=1}^{I-k} \gamma_{j, k} \cdot E\left(F_{i, k} \cdot F_{j, k} \mid B_{k}\right)\right) \\ & =\frac{1}{\sum_{l=1}^{I-k} \gamma_{l, k}}\left(\begin{array}{l} \left.\gamma_{i, k} \cdot E\left(F_{i, k}^{2} \mid B_{k}\right)+\sum_{j \neq i}^{I-k} \gamma_{j, k}\right) \\ \\ = \\ \sum_{l=1}^{I-k} \gamma_{l, k} \\ = \\ \\ \\ \left.\sum_{l=1}^{I-k} \gamma_{l, k} \left\lvert\, F_{i, k} \cdot\left(\frac{\sigma_{k}^{2}}{\delta_{i, k}}+f_{k}^{2}\right)+\sum_{j \neq i}^{I-k} \gamma_{j, k} f_{k}^{2}\right.\right) \\ = \\ \left.\gamma_{i, k} \cdot \frac{\sigma_{k}^{2}}{\delta_{i, k}}+f_{k}^{2} \sum_{j=i}^{I-k} \gamma_{j, k}^{2}\right) \\ \frac{\sum_{i, k}}{I-k} \gamma_{l, k} \end{array} f_{k}^{2} .\right. \end{aligned}\tag{A.9}}$

From Proposition A. 2

$\begin{aligned} E\left(\hat{f}_{k}^{2} \mid B_{k}\right) & =\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)+\left(E\left(\hat{f}_{k} \mid B_{k}\right)\right)^{2} \\ & =\sigma_{k}^{2} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}+f_{k}^{2}} . \end{aligned}$

Taking together we have

$\begin{aligned} & E\left((I-k-1) \cdot \hat{\sigma}_{k}^{2} \mid B_{k}\right) \\ & =\sum_{i=1}^{I-k} \delta_{i, k} E\left(F_{i, k}^{2} \mid B_{k}\right)-2 \sum_{i=1}^{I-k} \delta_{i, k} E\left(F_{i, k} \cdot \hat{f}_{k} \mid B_{k}\right) \\ &\quad +\sum_{i=1}^{I-k} \delta_{i, k} E\left(\hat{f}_{k}^{2} \mid B_{k}\right) \\ & =\sum_{i=1}^{I-k} \delta_{i, k}\left(\frac{\sigma_{k}^{2}}{\delta_{i, k}}+f_{k}^{2}\right)-2 \sum_{i=1}^{I-k} \delta_{i, k}\left(\sigma_{k}^{2} \frac{\frac{\gamma_{i, k}}{\delta_{i, k}}}{\sum_{i=1}^{I-k} \gamma_{i, k}}+f_{k}^{2}\right) \\ &\quad +\sum_{i=1}^{I-k} \delta_{i, k}\left(\sigma_{k}^{2} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}+f_{k}^{2}\right) \\ & =(I-k) \sigma_{k}^{2}+f_{k}^{2} \sum_{i=1}^{I-k} \delta_{i, k}-2 \sigma_{k}^{2}-2 f_{k}^{2} \sum_{i=1}^{I-k} \delta_{i, k} \\ &\quad +\sigma_{k}^{2}-\frac{\sum_{i=1}^{I-k} \delta_{i, k} \sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}+f_{k}^{2} \sum_{i=1}^{I-k} \delta_{i, k} \\ & =(I-k-1) \sigma_{k}^{2}+\sigma_{k}^{2}\left[\frac{\sum_{i=1}^{I-k} \delta_{i, k} \sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}-1\right]. \end{aligned}\tag{A.10}$

Finally

$\begin{aligned} & E\left(\hat{\sigma}_{k}^{2}-\sigma_{k}^{2}\right)=E\left[E\left[\left(\hat{\sigma}_{k}^{2}-\sigma_{k}^{2}\right) \mid B_{k}\right]\right] \\ & \quad=\frac{\sigma_{k}^{2}}{I-k-1} E\left[\frac{\sum_{i=1}^{I-k} \delta_{i, k} \sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}-1\right] . \end{aligned}$

A.5. Proofs of auxiliary results

A.5.1. Proof of Lemma 9.1

Let define, for $1 \leq i \leq I$ and $1 \leq j \leq I$ , the set of observed data $C_{i, j}$ for accident year $i$ and up to development year $j$ , namely

$A_{i, j}=\left\{C_{i, k}: 1 \leq k \leq j\right\} .$

For $l=I+1-i, \ldots, I-1$ ,

$\begin{aligned} \operatorname{Var}\left(C_{i, l} \mid D_{I}\right)= & \operatorname{Var}\left(C_{i, l+1} \mid A_{i, I+1-i}\right) \\ & E\left[\operatorname{Var}\left(C_{i, l+1} \mid A_{i, l-1}\right) \mid A_{i, I+1-i}\right] \\ & +\operatorname{Var}\left[E\left(C_{i, l+1} \mid A_{i, I-1}\right) \mid A_{i, I+1-i}\right] \\ = & E\left[\left.\sigma_{l}^{2} \frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, A_{i, I+1-i}\right]+\operatorname{Var}\left[f_{l} C_{i, l} \mid A_{i, I+1-i}\right] \\ = & \sigma_{l}^{2} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, A_{i, I+1-i}\right]+f_{l}^{2} \operatorname{Var}\left[C_{i, l} \mid A_{i, I+1-i}\right] \\ = & \sigma_{l}^{2} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right]+f_{l}^{2} \operatorname{Var}\left[C_{i, l} \mid D_{I}\right]. \end{aligned}\tag{A.11}$

We multiply the both sides by $\prod_{k=l+1}^{L-1} f_{k}^{2}$ with the convention that an empty product equals 1 . Taking the sum over $l=I+1-i, \ldots, I-1$ , we obtain

$\begin{aligned} & \sum_{l=I+1-i}^{I-1} \operatorname{Var}\left(C_{i, l+1} \mid D_{I}\right) \prod_{k=l+1}^{I-1} f_{k}^{2} \\ & =\sum_{l=I+1-i}^{I-1} \sigma_{l}^{2} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right] \prod_{k=l+1}^{I-1} f_{k}^{2} \\ & \quad+\sum_{l=I+1-i}^{I-1} \operatorname{Var}\left[C_{i, l} \mid D_{I}\right] f_{l}^{2} \prod_{k=l}^{I-1} f_{k}^{2} \operatorname{Var}\left(C_{i, I} \mid D_{I}\right) \\ & \quad+\sum_{l=I+1-i}^{I-2} \operatorname{Var}\left(C_{i, l+1} \mid D_{I}\right) \prod_{k=I+1}^{I-1} f_{k}^{2} \\ & =\sum_{l=I+1-i}^{I-1} \sigma_{l}^{2} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right] \prod_{k=l+1}^{I-1} f_{k}^{2} \\ & \quad+\operatorname{Var}\left[C_{i, I+1-i} \mid D_{I}\right] \prod_{k=I+1-i}^{I-1} f_{k}^{2} \\ & \quad+\sum_{l=I+2-i}^{I-1} \operatorname{Var}\left[C_{i, l} \mid D_{I}\right] \prod_{k=l}^{I-1} f_{k}^{2} . \end{aligned}\tag{A.12}$

Since $\operatorname{Var}\left[C_{i, I+1-i} \mid D_{I}\right]=0$ and from the fact that

$\sum_{l=I+1-i}^{I-2} \operatorname{Var}\left(C_{i, l+1} \mid D_{I}\right) \prod_{k=l+1}^{I-1} f_{k}^{2}=\sum_{l=I+2-i}^{I-1} \operatorname{Var}\left[C_{i, l} \mid D_{I}\right] \prod_{k=l}^{I-1} f_{k}^{2},$

we finally get the proof of Lemma A. 1.

A.5.2. Proof of Proposition A. 1

Following Mack (1993), Mack (1994), we replace $S_{k}^{2}$ with $E\left(S_{k}^{2} \mid B_{k}\right)$ and $S_{j} S_{k}$ , with $E\left(S_{j} S_{k} \mid B_{k}\right)$ . This means that we approximate $S_{k}^{2}$ and $S_{j} S_{k}$ by varying and averaging as little data as possible so that as many values $C_{i, k}$ from data observed are kept fixed. Due to Proposition 4.1 (i) we have $E\left(\hat{f}_{k}-f_{k}\right)=0$ and therefore $E\left(S_{j} S_{k} \mid B_{k}\right)=0$ for $j<k$ because all $f_{r}, r<k$ , are scalars under $B_{k}$ . Since $\mathrm{E}\left(\left(f_{k}-\hat{f}_{k}\right)^{2} \mid B_{k}\right)=\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)$ we obtain from (A.6)

$E\left(S_{k}^{2} \mid B_{k}\right)=\hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-i}^{2} \operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right) f_{k+1}^{2} \cdot \ldots \cdot f_{I-1}^{2} .$

Taken together, we have replaced $F^{2}=\sum_{k=I+1-i}^{I-1} S_{k}^{2}$ with $\sum_{k=I+1-i}^{I-1} E\left(S_{k}^{2} \mid B_{k}\right)$ and the unknown parameters are replaced by their estimators. Altogether, we estimate $F^{2}$ by

$\sum_{k=I+1-i}^{I-1} \hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-1}^{2} \hat{f}_{k}^{2} \hat{f}_{k+1}^{2} \cdot \ldots \cdot \hat{f}_{I-1}^{2} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}}.$

A.5.3. Proof of Proposition A. 2

From definition of $\hat{f}_{k}$ in (14), we have

$\begin{aligned} \operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right) & =\operatorname{Var}\left(\left.\frac{\sum_{j=1}^{I-k} \gamma_{j, k} F_{j, k}}{\sum_{j=1}^{I-k} \gamma_{j, k}} \right\rvert\, B_{k}\right) \\ & =\frac{\sum_{j=1}^{I-k} \gamma_{j, k}^{2} \operatorname{Var}\left(F_{i, k} \mid B_{k}\right) \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}, \end{aligned}$

where the second equality is due to the $B_{k}$ -measurability of $\gamma_{j, k}$ , the assumption (4.3) and the convention that the product of 0 and $\infty$ equals to 0 .

A.5.4. Proof of Proposition A. 3

We find the estimator $\widehat{F_{i} F_{j}}$ in the similar way to the estimator $\widehat{F^{2}}$ (see proof of Proposition A.1).

$E\left[\left(S_{k}^{i}\right)^{2} \mid B_{k}\right]=\hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-i}^{2} \operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right) f_{k+1}^{2} \cdot \ldots \cdot f_{I-1}^{2}.$

For $i<j$ , we have

$\begin{aligned} \widehat{F_{i} F_{j}=} & \sum_{k=I+1-i}^{I-1} \hat{f}_{I+1-j} \cdot \ldots \cdot \hat{f}_{I-i} \\ & \cdot \hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-i}^{2} \operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right) \hat{f}_{k+1}^{2} \cdot \ldots \cdot \hat{f}_{I-1}^{2} \\ = & \sum_{k=I+1-i}^{I-1} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}} \hat{f}_{I+1-j} \cdot \ldots \cdot \hat{f}_{I-i} \\ & \cdot \hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-1}^{2} \cdot \hat{f}_{k}^{2} \cdot \hat{f}_{k+1}^{2} \cdot \ldots \cdot \hat{f}_{I-1}^{2} \\ = & \sum_{k=I+1-i}^{I-1} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}}\left(\hat{f}_{I+1-j} \cdot \ldots \cdot \hat{f}_{I-1}\right) \\ & \cdot\left(\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1}\right) . \end{aligned}. \tag{A.13}$

B. Data

We present in Table B.8 the triangle of RAA data analysed in Mack (1994) and Murphy, Bardis, and Majidi (2012).

Table B.8.RAA run-off triangle (cumulative payments)

Accident Year i	Development Year j
Accident Year i	1	2	3	4	5	6	7	8	9	10
1	5 012	8 269	10 907	11 805	13 539	16 181	18 009	18 608	18 662	18 834
2	106	4 285	5 396	10 666	13 782	15 599	15 496	16 169	16 704
3	3 410	8 992	13 873	16 141	18 735	22 214	22 863	23 466
4	5 655	11 555	15 766	21 266	23 425	26 083	27 067
5	1 092	9 565	15 836	22 169	25 955	26 180
6	1 513	6 445	11 702	12 935	15 852
7	557	4 020	10 946	12 314
8	1 351	6 947	13 112
9	3 133	5 395
10	2 063

C. Individual link ratios of RAA run-off triangle

Table C.9.Individual link ratios

$F_{i, j}$ (age-to-age factors) of run-off triangle RAA

AY	1–2	2–3	3–4	4–5	5–6	6–7	7–8	8–9	9–10
1	1,650	1,319	1,082	1,147	1,195	1,113	1,033	1,003	1,009
2	40,425	1,259	1,977	1,292	1,132	0,993	1,043	1,033
3	2,637	1,543	1,163	1,161	1,186	1,029	1,026
4	2,043	1,364	1,349	1,102	1,113	1,038
5	8,759	1,656	1,400	1,171	1,009
6	4,260	1,816	1,105	1,226
7	7,217	2,723	1,125
8	5,142	1,887
9	1,722

D. LAD estimator

The least absolute deviation (LAD) method or $L_{1}$ (also known as Least Absolute Value (LAV)) method is a widely known alternative to the classical least squares (LS) or $L_{2}$ method for statistical analysis of linear regression models. Instead of minimizing the sum of squared errors, it minimizes the sum of absolute values of errors. More precisely, in the context of linear regression model, estimates are found by solving the following optimisation problem

$\min _{\beta}\left\{\sum_{i=1}^{n}\left|e_{i}\right|\right\}=\min _{\beta}\left\{\sum_{i=1}^{n}\left|y_{i}-\sum_{j}^{m} x_{i j} \beta_{j}\right|\right\},$

where $e_{i}:=y_{i}-\sum_{j}^{m} x_{i j} \beta_{j}, i=1,2, \ldots, n$ and $j=1$ , $2, \ldots, m$ . Unlike the LS method, the LAD method is not sensitive to outliers and produces robust estimates. LAD method is reduced to a linear programming problem and the computational difficulty is now entirely overcome by the availability of computing power and the effectiveness of linear programming.

Least absolute values (LAV) regression is very resistant to observations with unusual values in data.

In the numerical example presented in Section 6.2.2, we used one-dimensional ( $m=1$ ) LAD procedure where we took, for each column $k$ of run-of triangle, $y_{i}:=F_{k, i}$ and $x_{i 1}:=1$ for all $i$ .

One more thing merits mentioning here. In the simple one-dimensional case $(m=1)$ the LAD estimator yields to the sample median (see Abur and Expósito 2004, 141).

E. Robust estimation-Limits in estimation of $\beta$ parameters

We want to examine the existence of solution of equation (6.2). In this purpose, we study the properties of the flowing type of functions,

$h_{k}\left(\beta_{k}\right):=\left(\sum_{i=1}^{N_{k}} a_{i, k}^{-\beta} b_{i, k}\right)\left(\sum_{i=1}^{N_{k}} a_{i, k}^{\beta} c_{i, k}\right),$

with $a_{i, k}:=C_{i, k}, b_{i, k}:=\gamma_{i, k}^{2} /\left(\sum_{i=1}^{I-k} \gamma_{i, k}\right)^{2}, c_{i, k}:=1 /(I-k-1)$ $\left(\hat{f_{k}}-F_{i, k}\right)$ and $N_{k}:=I-k$ . In the sequel, without the loss of generality, we omit the index $k$ corresponding to the column of run-off triangle. Thus, we consider the function

$h(\beta):=\left(\sum_{i=1}^{N} a_{i}^{-\beta} b_{i}\right)\left(\sum_{i=1}^{N} a_{i}^{\beta} c_{i}\right),$

with $a_{i} \geq 0,0 \leq b_{i} \leq 1$ and $c_{i} \geq 0$ . We rewrite the function $h$ as follows:

$h(\beta):=\sum_{i=1}^{N} \sum_{j=1}^{N}\left(a_{i}^{-\beta} b_{i}\right)\left(a_{j}^{\beta} c_{j}\right) .$

We easily observe that the function $h$ tends to $\infty$ as $\beta$ tends to $\infty$ or $-\infty$ . Let us define, $d_{i, j}:=\left(a_{j k} / a_{i k}\right)$ , for $i<j$ , and where the indices $i_{k}<j_{k}$ are such that $\left(a_{j_{k}} / a_{i_{k}}\right)>1$ . Then, by simple decomposition of double sum, we get:

$h(\beta):=\sum_{i=1}^{N} b_{i} c_{i}+\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{i} c_{j} \cdot d_{i, j}^{\beta}+\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{j} c_{i} \cdot d_{j, i}^{\beta},$

where by our notation $d_{i, j}>1$ and $d_{j, i}<1$ . By the straightforward computations it is easy to show that

$\begin{aligned} h^{\prime}(\beta):= & \sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{i} c_{j} \cdot \ln \left(d_{i, j}\right) \cdot d_{i, j}^{\beta} \\ & +\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{j} c_{i} \cdot \ln \left(d_{j, i}\right) \cdot d_{j, i}^{\beta}, \end{aligned}$

Since $\ln \left(d_{i, j}\right)>0$ and $\ln \left(d_{j, i}\right)<0$ , the first derivative $h^{\prime}$ has a limit in $-\infty$ and $\infty$ if $\beta$ tends to $-\infty$ and $\infty$ respectively. In addition, the second derivative $h^{\prime \prime}$ is given by

$\begin{aligned} h^{\prime \prime}(\beta): & =\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{i} c_{j} \cdot\left(\ln \left(d_{i, j}\right)\right)^{2} \cdot d_{i, j}^{\beta} \\ & +\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{j} c_{i} \cdot\left(\ln \left(d_{j, i}\right)\right)^{2} \cdot d_{j, i}^{\beta} . \end{aligned}$

Given that $d_{i, j}^{\beta}$ and $d_{j, i}^{\beta}$ are strictly positive functions and all coefficients are positive, the second derivative of $h$ is strictly positive. This means that first derivative of $h$ is increasing function. Together with the previous facts it implies that $h$ has an absolute minimum. In consequence, the equation (23) has zero, one or two solutions.

In the case where two solutions of opposite sign exist, the actuary should decide which one corresponds better to the considered line of business. In fact, as mentioned in Murphy, Bardis, and Majidi (2012), the choice of negative solution does not seem to be unreasonable in some situations. This issue is out of scope of this paper. In our numerical example the solution is determined by the Excel tool called solver.

Generalized Mack Chain-Ladder Model of Reserving with Robust Estimation

Abstract

1. Introduction and motivation

2. Notations and definitions

2.1. Run-off triangle

2.2. Outstanding reserves

2.3. (Conditional) mean square error of prediction (MSEP)

3. Mack chain-ladder (MCL) model

3.1. Model assumptions of MCL method

3.2. Estimation of parameters in the MCL model

3.3. Properties of estimators from MCL model

3.4. Estimators of conditional MSEP in MCL model

3.4.1. Single accident years

3.4.2. Aggregated accident years

3.5. Numerical application of the MCL method

3.6. Limits of MCL method

4. General Mack chain-ladder model

4.1. Model assumptions

4.2. Model estimators

Proposition 4.1.

5. Main results

5.1. Single accident years

5.2. Aggregation over prior accident year

6. Applications of GMCL model

6.1. Method proxy for factors selection

6.2. Robust estimation in GMCL model

6.2.1. Algorithm for fitting \alpha_{k}\alpha_{k} and \beta_{k}\beta_{k} parameters

6.2.2. Numerical example

6.3. Validation of results from reserving softwares

7. Conclusion

Acknowledgments

Appendix: Mathematical Proofs

A.1. Proof of Result 5.1

A.2. Proof of result 5.2 (overall standard error)

A.3. Proof of Proposition 3.1

A.4. Proof of Proposition 4.1

A.5. Proofs of auxiliary results

A.5.1. Proof of Lemma 9.1

A.5.2. Proof of Proposition A. 1

A.5.3. Proof of Proposition A. 2

A.5.4. Proof of Proposition A. 3

B. Data

C. Individual link ratios of RAA run-off triangle

D. LAD estimator

E. Robust estimation-Limits in estimation of \beta\beta parameters

References

6.2.1. Algorithm for fitting $\alpha_{k}$ and $\beta_{k}$ parameters

E. Robust estimation-Limits in estimation of $\beta$ parameters