1. Introduction and motivation
The provision for outstanding claims is one of the main components of technical provisions of insurance company’s liabilities. Measuring the deviation of the true amount of reserves from its estimation is one of the major actuarial challenges. Senior managers, shareholders, rating agencies, and insurance regulators all have an interest in knowing the magnitude of these potential variations (reserve uncertainty) since companies with large potential deviations need more capital or reinsurance.
One of the most known methods of reserving used in practice is the approach called chain-ladder. This method belongs to the family of development factor models (DFMs). The first stochastic approach based on the chain-ladder technique was proposed by Mack (1993, 1994). In these studies, Mack proposed the estimation of the mean square error of prediction (MSEP) of claims reserves based upon all-year volume-weighted average of loss development factors (also called: link ratios, age-to-age factors, report-toreport factors). The variance structure was supposed to be proportional to the development period’s initial loss. This assumption is sufficient for the weighted average development factors to have optimal statistical properties (BLUE, or best linear unbiased estimate). Some authors (see Murphy 1996, 188; Mack 1999, 15) pointed out that the estimation of chain-ladder factors is connected with the estimation in the framework of linear model by weighted least squares (WLS) regression approach. They also observed that by modifying the original variance assumption from Mack (1993) the corresponding estimators of chainladder development factors keep their BLUE property (see Murphy 1996; Barnett and Zehnwirth 2000; Saito 2009). It is worth underlying that the modification of variance assumption leads to different point estimators for development factors (arithmetic average, slope of regression etc.; see Remark 3.1 for more details).
One of the major challenges in everyday actuarial practice is selecting the loss development factors (LDFs).
The adjustments to make data more homogeneous are often justified for a number of reasons: unstable run-off triangles, outliers, inaccurate and incomplete data, etc.). Most actuaries use somewhat arbitrary rules of thumb in selecting the LDFs. Blumsohn and Laufer (2009) describe this topic in great detail. In this project, a group of actuaries were asked to select LDFs for an incurred run-off triangle. The important number of ways of LDFs selection was provided by the participants. The approaches proposed to evaluate the estimation of expected value of reserves varied widely and the additional information about the error of prediction of this estimation could be helpful in decision making. That is why it is extremely important from a practical point of view to have a method that provides the estimation of conditional MSEP of ultimate claims (or reserves) in the context of LDFs selection by actuaries. In the present study we provide such a tool embedded in the theoretical framework to quantify the standard error of prediction of the claims reserves in the case where some factors have been excluded from the estimation of model parameters. However, we do not judge whether these ad hoc approaches of selecting factors are correct or wrong. We rather assume that the expert judgment taken by an actuary could always be justified by his specific knowledge of considered business.
Measuring the variability of the reserves in this context is poorly developed in the literature. That is why in practice actuaries and reserving software developers often use the proxy methods based on formula for MSEP derived in Mack (1993). The approximations mainly consist of replacing the main parameters by their estimators computed by the other approach without changing the main formula. This procedure is incorrect because in the chain-ladder framework the formula for MSEP depends among others on the standard error of chain-ladder factors and it is not accurate to simply plug in the new estimators in the old formula. The other proxy method often used in practice consists of applying the coefficients of variation of ultimate loss from Mack (1993) (ratio of square root of MSEP of ultimate loss over ultimate loss) in order to derive the MSEP estimators of a new approach. It turns out that in general these approximations are highly inappropriate (see example in Section 6.3).
We think that, in some simple cases (no curve fitting for LDFs, for example) the approximations mentioned above are the consequences of bad understanding of the main formula for estimation of MSEP in Mack (1993). Moreover, the approximations used by actuaries and actuarial software developers could be avoided by using the more appropriate existing models. One such model was proposed by Mack (1999). To our knowledge, this was the first study that showed how to measure the uncertainty of reserves in the situation when an actuary selects the LDFs. In our opinion, the important results obtained in this paper are not always used in practice because, instead of explicit formula, the recursive equation is given there for estimation of MSEP of ultimate loss.
Mack (1999) is an important paper that allows the fully understanding of the MSEP formula and avoid the inappropriate approximation when it is not necessary. We summarize the details of this method in Section 3. One of the major limitations of this method is the underestimation the MSEP of ultimate loss in the case where the number of excluding data is important. We discuss this topic in detail in Section 3.5. One possible solution to overcome this difficulty is to extend the existing approach proposed by Mack (1999).
Therefore, we propose a general approach for stochastic claims reserving in the framework of chainladder model, extending the model proposed by Mack (1999) and Murphy, Bardis, and Majidi (2012). This extension is three fold. First, our general tool has a educational role and makes it possible to validate the results from other approaches. More precisely, our general formula for estimation of MSEP of outstanding loss liabilities can be used to fully understand the Mack (1993), Mack (1994), and Mack (1999) model. Furthermore, under new solvency requirements of Solvency II, insurance companies use the bootstrap-type stochastic reserving methods to determine the economic capital corresponding to the reserve risk. The bootstrap method allows estimation of a whole claims reserves distribution via resampling techniques and Monte Carlo simulations. It seems to be crucial for non-life insurance companies to be able to validate the results given by the industrial software where we do not have access to the code and when the number of shortcuts may be applied. Our approach can be used to validate the estimation of the first two moments of the loss distribution in the case where selection of development factors was employed and the different weights in estimation of chain-ladder factors and volatility parameters were used (see Section 6.3 for more details).
Second, our general Mack chain-ladder (GMCL) model can be used to construct the proxy solutions to overcome the limits of Mack’s (1999) approach, i.e., the use of the same weights for parameters estimation (see discussion in Section 3.6). This means that, for the methods where we eliminate the considerable number of observations, we reduce also the data for variability of the reserves. This mechanically impacts the estimation of MSEP of loss liabilities, which in such cases is generally underestimated. We propose then the possible solution to overcome this kind of difficulty (see Section 6.1 for more details).
Finally, the third and really important application from a practical point of view consists of bridging the point estimation of chain-ladder parameters with the theory of robust statistics. As mentioned above, the point estimators of chain-ladder factors can be obtained in the linear regression framework by applying the weighted least squares procedure. It is well known that the OLS estimators are fragile to the outliers. That is why we propose using the robust techniques of estimation such as: M-estimators, -estimators, etc (see Section 6.2).
The reminder of this paper is organized as follows. In Section 2 we present our notations and definitions. We review in Section 3 the MCL and its main limitations. In Section 4, we present the GMCL and the main results are derived in Section 5. Finally, Section 6 introduces the numerical applications of GMCL. All proofs are provided in the Appendix. The related topics such as tail factor, curve fitting, diagnostics and validation of the main model hypothesis, are out of scope of this paper and will be treated elsewhere.
2. Notations and definitions
2.1. Run-off triangle
Let denote the random variables (cumulative payments, inccured, reported claims numbers, etc.) for accident year until development year where the accident year is referred to as the year in which an event triggering insurance claims occurs. We assume that are random variables observable for calendar years and non-observable (to be predicted) for calendar years The observable are represented by the so-called run-off trapezoids or run-off triangles Table 1 gives an example of a typical run-off triangle. In order to simplify our notation, we assume that (run-off triangle). However, all the results we present here can be easily extended to the case when the last accident year for which data is available is greater than the last development year, i.e., (run-off trapezoid).
2.2. Outstanding reserves
Let et denote the outstanding claims liabilities for accident year
Ri=Ci,I−Ci,I−i+1,
and the total outstanding loss liabilities for all accident years,
R=I∑i=1Ri
We use the term claims reserves to describe the prediction of the outstanding loss liabilities. Hence, let and denote the claims reserves for accident year and the total claims reserves for aggregated accident years, respectively, where is a predictor for
2.3. (Conditional) mean square error of prediction (MSEP)
As already stated above, finding suitable prediction of ultimate loss is rather the beginning of the process of reserving, and insurers need to assess the variability of these amounts. We are interested then in the quantification of the prediction uncertainty of the ultimate loss, i.e., and (or equivalently of claims reserves, i.e., and ). For that, we have to choose an appropriate risk measure which determines a conception of measuring the “distance” between the prediction and the actual outcomes. In this paper, following the actuarial literature, we quantify the prediction uncertainty using the most popular such measure, the so-called mean-square error of prediction (MSEP).
msepˆCil∣DI(CiI)=E[(ˆCiI−CiI)2∣DI],
msep∑Ii=1ˆcil∣Dl(I∑i=1CiI)=E[(I∑i=1ˆCiI−I∑i=1CiI)2∣DI]
where
DI={Ci,j:i+j≤I+1},
denote the claims data available at time
3. Mack chain-ladder (MCL) model
A major everyday challenge of actuarial work is selecting loss development factors for number of reasons (outliers in triangle, inaccurate data, incompleteness, etc). Most actuaries use somewhat arbitrary rules of thumb in selecting the loss ratios. In Blumsohn and Laufer (2009), a group of actuaries were asked to select age-to-age factors for a 12-years triangle of umbrella business. The important number of ways of selecting loss ratios was provided by the participants. It is important, then, from practical point of view to have a method that provides the estimation of conditional MSEP of ultimate claims in the context of factor selection of actuaries.
To the best of our knowledge, the paper by Mack (1999) is one of the first studies dealing with factors selection and variability of reserves estimation in the framework of the chain-ladder method. This paper is an extension of Mack (1993).
In the remaining part of this section we recall the assumptions of the MCL model from Mack (1999). Afterwards, we present the numerical example illustrating the limits of this approach. Finally, we indicate the possible expansion of MCL method and its potential applications.
3.1. Model assumptions of MCL method
Let define the individual development factors, for and
Fi,k=Ci,k+1/Ci,k.
Following Mack (1999), we assume [(MCL.1)]
- There exist constants such that
E(Fi,k∣Ci,1,…,Ci,k)=fk.
The parameters  are often called loss development factors (LDF), link ratios or age-to-age factors.
2. There exist constants  such that for all  and  we have
Var(Fi,k∣Ci,1,…,Ci,k)=σ2kwi,kCαi,k, with wi,k∈[0,1].
The parameters  are referred here as variance parameters (LDF).
3. The accident years  are independent.
3.2. Estimation of parameters in the MCL model
- Given the information and for the factors are estimated by
ˆfk=∑I−ki=1wi,kCαi,kFi,k∑I−ki=1γi,k,α∈{0,1,2}.
- Given the information and for the variance parameters are estimated by
ˆσ2k=1Ik−1I−k∑i=1wi,kCαi,k(Fi,k−ˆfk)2,α∈{0,1,2},
where represents the number of weights different from 0, namely,
Formula (3.4) does not yield an estimator for because it is not possible to estimate this parameter from the single observation Following Mack (1993, 1994, 1999), if and if the claims development is believed to be finished after years we can put If not, the simple formula of extrapolation can be applied by requiring This leads to the following definition
ˆσ2I−1:=min(ˆσ4I−2/ˆσ2I−3,min(ˆσ2I−3,ˆσ2I−2)).
Remark 3.1. The parameter determines the different ways of estimation of For the sake of simplicity, let us assume that for all We present below the possible choices of and their interpretation.
- If  we get the classical chain ladder estimate of 
 ˆfk=∑I−ki=1Ci,kFi,k∑I−ki=1Ci,k=∑I−ki=1Ci,k+1∑I−ki=1Ci,k,for1≤k≤I−1.
- If we get the model for which the estimators of the age-to-age factors are the straightforward average of the observed individual development factors defined via (3.1), i.e.,
ˆfk=1I−kI−k∑i=1Fi,k, for 1≤k≤I−1.
- If we get the model for which the estimators of the age-to-age factors are the results of an ordinary regression of against with intercept 0 , i.e.,
ˆfk=∑I−ki=1C2i,kFi,k∑I−ki=1C2i,k=∑I−ki=1Ci,kCi,k+1∑I−k−1i=0C2i,k, for 0≤k≤I−1.
3.3. Properties of estimators from MCL model
Proposition 3.1
i. The estimators  given in (3.3) are unbiased and uncorrelated.
ii. The estimators  of  have the minimal variance among all unbiased estimators of  which are the weighted average of the observed development factors 
iii. The estimator  given in (3.4) is the unbiased estimator of the parameter 
iv. Under the model assumptions (MCL.1) and (MCL.3) we have
E(Ci,I∣DI)=Ci,I+1−ifi,I+1−i⋅…⋅fI−1.
This implies, together with the fact that  are uncorrelated, that  is unbiased estimator of 
v. The expected values of the estimator
ˆCi,I=Ci,I+1−i⋅I−1∏k=I+1−iˆfk,
for the ultimate claims amount and of the true ultimate claims amount are equal, i.e.,
The proof is provided in Appendix A.3.
3.4. Estimators of conditional MSEP in MCL model
3.4.1. Single accident years
Under assumptions of the MCL model we have the following estimator for the conditional estimation error of a single accident year :
^msepˆcii,l∣Dl(Ci,I)=(ˆCi,I)2⋅I−1∑k=I−i+1ˆσ2kˆf2k(1ˆwi,kˆCαi,k+1∑I−kj=1wj,kCαj,k),
where, for we define and and are given in (3.3) and (3.4)-(3.5) respectively.
3.4.2. Aggregated accident years
^msep∑Ii=1ˆCi,l∣Dl(I∑i=1Ci,I)=I∑i=2^msepˆCi,l∣Dl(Ci,I)+I∑i=2ˆCi,I(I∑j=i+1ˆCj,I)I−1∑k=I−i+12ˆσ2k/(ˆfk)2∑I−kl=1wl,kCαl,k,
where and are given in (3.3) and (3.4)-(3.5) respectively.
3.5. Numerical application of the MCL method
As mentioned above, the factors selection methods are an integral part of everyday actuarial practice.
Here we choose from Blumsohn and Laufer (2009) several such methods where estimates are computed as different averages using varying weights and varying number of accident years: all/3/5-years weighted average and all excluding higher and lower (AEHL) factor average. We consider as well other popular methods in actuarial practice based on sample median.
More precisely, for RAA run-off triangle (see Appendix B, Section B.8), we apply the MCL model from Mack (1999) with the following parameters. For all five methods described below we choose arithmetic averages of and we compute the estimators and according to formula (3.3) and (3.4)-(3.5), respectively. This allows us to compare the results with sample median method which is rather consistent with straightforward average of development factors (see method number (5) below)
- 
ALL AV: are computed as arithmetic average of all individual link ratios More precisely, we define the weights in the following way: for all 
- 
AEHL: are computed as arithmetic average of all individual link ratios, excluding the highest and the lowest values of More precisely, we define the weights in the following way: for fixed for such that and where for denotes the order statistics of For remaining indices for fixed we take 
- 
5 Years AV: are computed as an arithmetic average of individual link ratios from five latest accidents years. More precisely, we define the weights in the following way: for For remaining indices for fixed we take 
- 
3 Years AV: are computed as an arithmetic average of individual link ratios from three latest accidents years. More precisely, we define the weights in the following way: for For remaining indices for fixed we take 
- 
Median: are computed as an arithmetic average of individual link ratios in the way to obtain the sample median. More precisely, we put or in the way that the estimators of the age-to-age factors are given by 
ˆfk=median{Fi,k:i∈{1,…,I−k}}.
The median denotes the sample median that for the sample is computed by
median{Xi:i∈{1,…,n}}:={X(n+12) if n is odd X(n2)+X(n2+1)2 otherwise .
where denotes the order statistics of the sample
Remark 3.2. In the case where there is only one observation in estimation of parameter (odd number of data in sample median computation) we choose the additional factor in order to have two observations and be able to apply the formula (3.4).
In Table 2, we present the estimation of total amount of claims reserves as well as the value of estimators of aggregated Recall that where We observe that to obtain it is enough to have the estimators of ultimate claims for all accident year In consequence and we use the formula (3.7) to estimate this quantity. We compute as well the coefficient of variation of given by The last two lines of Table 2 indicate the relative proportion of and for each of five methods considered, in comparison to the ALL AV method which is the reference method in our example.
3.6. Limits of MCL method
As can be seen in Table 2, the four last methods (columns (2)-(5)) reduce significantly the estimation of comparing to the first method ALL AV. For the methods (3), (4) and (5), this is mainly due to the elimination of relatively significant number of development factors from estimation especially for the first development years which correspond to the columns of the run-off triangle. This phenomena is especially seen in the case of sample median method in which, for each development factor we keep at most two of link ratios in estimation of From statistical point of view, this is clearly not enough to perform the robust estimation. As a consequence, this kind of methods reduce unnaturally the variability of reserves. This could be dangerous for example in terms of evaluation of the economical capital for reserve risk required by the new Solvency II regime.
Beyond the limits stated above, there are some incoherences with application of weights for the and sample median methods. Indeed, the weights should be measurable random variables in order to be able to derive the main results of MCL approach (see, for example, Proposition A.2). Although for the method 5 Year AV and 3 Years AV we can fix the weights without knowing the information (knowing all observation in the run-off triangle, see (5)), this is not a case for the AEHL and sample median methods. The reason is that we need to know the observation in order to specify the corresponding weights for those two methods. That is why the weights are not measurable but rather -measurable. This means that the formula for the expectation of and are not correct. Regarding the sample median method, the derivation of requires the computation of the moments of order statistics (see Jeng 2010) and those are strongly related to the distribution of To overcome these difficulties we propose two solutions: the simple Proxy method (see Section 6.1) and the more complex one based on a robust estimation (see Section 6.2). The first approach is programmed to avoid artificial volatility increase and it is based on all link ratios in estimation of volatility parameters (scale parameters in linear regression). The second method consists of developing an approach that allows using any robust estimators of (location) and (scale) parameters.
4. General Mack chain-ladder model
4.1. Model assumptions
Before stating the main assumptions of our general approach, let us assume that functions are Borel measurable. Let be the nonnegative random variables defined by,
Our model is formalized by the following assumptions:
(GMCL.1) There exist constants  such that
E(Fi,k∣Ci,1,…,Ci,k)=fk.
(GMCL.2) There exist constants such that for all and we have
Var(Fi,k∣Ci,1,…,Ci,k)={σ2kδi,k if δi,k≠0 a.s., ∞ if δi,k=0 a.s.,
where a.s. means almost surely.
(GMCL.3) The accident years are independent.
We observe that from the above assumptions the main difference between MCL and GMCL lies in the variance assumption. This modification allows us to introduce different weights in estimation of the parameters and
4.2. Model estimators
Suppose that functions are Borel measurable. Let be the non-negative random variables defined by,
- Given the information the factors are estimated by
ˆfk=∑I−ki=1γi,kFi,k∑I−ki=1γi,k, for 1≤k≤I−1
It becomes obvious from assumption (GMCL.2) that in order to compute correctly the variance of (see Proposition A. 2 in Appendix) we have to assume that
{ if δi,j=0 then γi,j=0}.
- Given the information the variance parameters are estimated by
ˆσ2k=1Ik−1I−k∑i=1δi,k(Fi,k−ˆfk)2, for 1≤k≤I−2,
where represents the number of weights different from 0 , namely,
In the analogue way to (3.4) we define
ˆσ2I−1=min(ˆσ4I−2/ˆσ2I−3,min(ˆσ2I−3,ˆσ2I−2)).
Proposition 4.1.
(i) The estimators  given in (4.2) are unbiased and uncorrelated.
(ii) For  if  for all  then the estimators  of  have the minimal variance among all unbiased estimators of  which are the weighted average of the observed development factors 
For if for some then the relative efficiency of s.e. with respect to s.e. i.e., the ratio
s.e. (ˆfγ=δk∣Bk) s.e. (ˆfγ≠δk∣Bk):=Var(ˆfγ≠δk∣Bk)1/2Var(ˆfγ=δk∣Bk)1/2=∑I−kj=1γ2j,kδj,k⋅1{δj,k≠0}∑I−kj=1γj,k⋅1{δj,k≠0}
(iii) For if for all then the estimator given in (4.4) is the unbiased estimator of the parameter
For if for some then the bias of the estimator is given by the following formula
E\left[\hat{\boldsymbol{\sigma}}_{k}^{2}-\sigma_{k}^{2}\right]=\frac{\sigma_{k}^{2}}{I_{k}-1} E\left[\frac{\sum_{i=1}^{I-k} \delta_{i, k}\left(\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k \neq 0}\right\}}\right)}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}-1\right].
(iv) Under the model assumptions (GMCL.1) and (GMCL.3) we have
E\left(C_{i, l} \mid D_{I}\right)=C_{i, I+1-i} f_{i, I+1-i} \cdot \ldots \cdot f_{I-1}.
This implies, together with the fact that are uncorrelated, that is unbiased estimator of
(v) The expected values of the estimator
\hat{C}_{i, I}=C_{i, I+1-i} \cdot \prod_{k=I+1-i}^{I-1} \hat{f}_{k},
for the ultimate claims amount and of the true ultimate claims amount are equal, i.e.,
The proof of this Proposition is provided in Appendix A. 4.
Remark 4.1.
- If we set for in (4.2) and (4.4) we get the assumptions of MCL model from Mack (1999) (see also Mack (1993), Mack (1994) and Saito 2009).
- If we put for in (4.2) and (4.4) we get the stochastic chain-ladder model from Murphy, Bardis, and Majidi (2012).
5. Main results
5.1. Single accident years
Result 5.1 (Conditional MSEP estimator for a single accident year).
\widehat{\operatorname{msep}}_{\hat{C}_{i, l} \mid D_{l}}\left(C_{i, I}\right)=\left(\hat{C}_{i, I}\right)^{2} \cdot\left(\hat{\Gamma}_{i, I}+\hat{\Delta}_{i, I}\right), \tag{5.1}
where
\hat{\Gamma}_{i, I}=\sum_{k=I-i+1}^{J-1} \frac{\hat{\sigma}_{k}^{2} /\left(\hat{f}_{k}\right)^{2}}{\hat{\delta}_{i, k}},\tag{5.2}
\hat{\Delta}_{i, I}=\sum_{k=I-i+1}^{J-1} \frac{\hat{\sigma}_{k}^{2} /\left(\hat{f}_{k}\right)^{2}}{\left.\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}} \cdot \sum_{j=1}^{I-k} \frac{\left(\gamma_{j, k}\right)^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}},\tag{5.3}
and and are given in (4.2) and (4.4)-(4.5), respectively.
5.2. Aggregation over prior accident year
Result 5.2 (Conditional MSEP estimator for aggregated years).
\begin{aligned} & \widehat{m s e p}_{\sum_{i=1} \hat{C}_{i l} \mid D_{l}}\left(\sum_{i=1}^{I} C_{i, I}\right)=\sum_{i=2}^{I} \widehat{m s e p}_{\hat{C}_{i l} \mid D_{t}}\left(C_{i l}\right) \\ & \quad+\sum_{i=2}^{I} \hat{C}_{i, l}\left(\sum_{j=i+1}^{I} \hat{C}_{j, I}\right) \sum_{k=I-i+1}^{J-1} \frac{\hat{\sigma}_{k}^{2} /\left(\hat{f}_{k}\right)^{2}}{\left(\sum_{l=1}^{I-k} \gamma_{l, k}\right)^{2}} \\ & \quad \cdot \sum_{l=1}^{I-k} \frac{\left(\gamma_{l, k}\right)^{2}}{\delta_{l, k}} \cdot \mathbf{1}_{\left\{\delta_{\left.\delta_{l, k} \neq 0\right\}}\right.}, \end{aligned} \tag{5.4}
where and are defined in (4.2) and (4.4)-(4.5), respectively.
6. Applications of GMCL model
In our numerical example in Section 3.5 we have seen that the assumption about the same weights in estimation of parameters and yields for some methods to an artificial reduction of variability of reserves amounts (refer to Table 2). To overcome this difficulty, we introduced the different weights and in computation of and respectively. In the following application we indicate how one can possibly estimate the weights and and we point out some other interesting applications.
6.1. Method proxy for factors selection
In this section, we examine our general framework and which means that and In this so called proxy method we impose using all link ratios in estimation of parameters for all For all five methods presented, we take We turn back to our numerical example from Section 3.5 and we evaluate the same estimators for the alreadypresented five methods with the only difference in weights of estimation. More precisely:
- 
ALL AV : The are estimated with for all Parameters are computed as a arithmetic average of all individual link ratios More precisely, we define the weights in the following way: for all 
- 
AEHL: The are estimated with for all Parameters are computed as an arithmetic average of all individual link ratios excluding the highest and the lowest values of More precisely, we define the weights in the following way: for fixed for such that and where for denotes the order statistics of For remaining indices for fixed we take 
- 
5 Years AV: The are estimated with for all Parameters are computed as an arithmetic average of individual link ratios from five latest accidents years. More precisely, we define the weights in the following way: for For remaining indices for fixed we take 
- 
3 Years AV: The are estimated with for all Parameters are computed as an arithmetic average of individual link ratios from three latest accidents years. More precisely, we define the weights in the following way: for For remaining indices for fixed we take 
- 
Median: The are estimated with for all Parameters are computed as an arithmetic average of individual link ratios in the way to obtain the sample median. More precisely, we put or in the way that the estimators of the age-to-age factors are given by 
\hat{f}_{k}=\operatorname{median}\left\{F_{i, k}: i \in\{1, \ldots, I-k\}\right\},
where median denotes the sample median which, for the sample is computed by
\begin{aligned} & \text { median }\left\{X_{i}: i \in\{1, \ldots, n\}\right\} \\ & := \begin{cases}X_{\left(\frac{n+1}{2}\right)} & \text { if } n \text { is odd } \\ \frac{X_{\left(\frac{n}{2}\right)}+X_{\left(\frac{n}{2}+1\right)}}{2} & \text { otherwise }\end{cases} \end{aligned}
and denotes the order statistics of the sample
In Table 3 we present the estimation of and using the five methods described above. In terms of MSEP we see that, in general, we have the values greater than our reference method ALL AV from column (1), which stays unchanged compared to Table 2. This is not surprising because by selecting of the development factors we decreased the estimated values of and by using all observations in computation we mechanically increased the dispersion around the values of In view of our results from Tables 2 and 3, the proxy method overestimates in general the real MSEP, and can then be treated as its upper bound. However, it can be useful as a tool to perform the sensitivity analysis for testing the impact on the reserve volatility of excluding the specific set of link ratios. Finally, it can be seen as a measure of relative prudence of other approach of measuring the variability of reserves by means of MSEP estimators.
6.2. Robust estimation in GMCL model
From the previous two numerical examples (MCL vs. GMCL results), we observe that, in general, the first approach underestimates and second overestimates the MSEP of claims reserves (see Tables 2 and 3). In this section we present an intermediate solution for our general problem that allows us to evaluate the estimation of MSEP of reserves in case of development factor selection. This go-between solution is based on the robust statistics in estimation of model parameters and The term robust statistics is meant in the sense of Huber and Ronchetti (2009).
As already mentioned, the assumption GMCL. 2 about the conditional variance of allows us to estimate the factors in the framework of linear regression obtained by the means of weighted least squares procedure (see Murphy, Bardis, and Majidi 2012 and the references therein). Although these estimators are easy to compute and have excellent theoretical properties (see Proposition 4.1), they rely on quite strict assumptions, and their violation may lead to useless results.
One possible solution to overcome this difficulty is to use robust estimation techniques. The idea of robust statistics is to account for certain deviations from idealized model assumptions. Typically, robust methods reduce the influence of outlying observation on the estimator.
We take the following assumption in our general framework of GMCL model: and with to be estimated. This means that and
The following algorithm shows how one can estimate the parameters for and for As can be seen, the presented method is based on a similar principle to the well known moment estimation method from point estimation theory.
6.2.1. Algorithm for fitting and parameters
Step 1. We select the robust estimators for and its variance. We denote these estimators by and respectively. These two quantities can be derived by numerous techniques described in the literature, such as: M-estimation, estimation, etc. (Huber and Ronchetti 2009) or trimmed mean (Jeng 2010).
Step 2. For every we find by solving the following equation where is given in equation (4.2), namely
\frac{\sum_{i=1}^{I-k} C_{i, k}^{\alpha_{k}} F_{i, k}}{\sum_{i=1}^{I-k} C_{i, k}^{\alpha_{i, k}}}=\tilde{f}_{k} .\tag{6.1}
The procedure to select the consistent together with the problem of existence of solution of equation (6.1) is treated in Murphy, Bardis, and Majidi (2012) (see Lemma 1 and the comments that follow it).
Step 3. For every we find by solving following equation: where is given in (A.8), namely,
\frac{1}{I-k-1} \sum_{i=1}^{I-k} C_{i, k}^{\beta_{k}}\left(F_{i, k}-\hat{f}_{k}\right)^{2} \cdot \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{C_{i, k}^{\beta_{k}}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}=\tilde{\operatorname{Var}}\left(\tilde{f}_{k}\right)\tag{6.2}
The parameters are given in equation (4.2) with estimated in Step 2. For since only one observation is available in our data, need to be estimated by other approaches. The limits for the values of for which the solution of equation (6.2) exists are presented in Appendix E.
6.2.2. Numerical example
In the present example we consider only the median method already presented in previous numerical applications. We concentrate on that particular method because our goal is not to present the extensive case study but rather to illustrate the general principle and the main steps of this application. Observe that the method AEHL can be treated by the theory of robust estimation by means of trimmed mean estimators (Jeng 2010).
To apply the above fitting algorithm for sample median method, we use the LAD (least absolute deviation) estimation procedure. The theoretical framework of LAD is presented in Appendix D. The values of are given by computing the sample median from as described in Section 3.5. The standard errors of are obtained via bootstrap techniques. In Table 4 we present the numerical values of s.e and Note that the last value of cannot be estimated from the data for the reasons discussed in the case of estimation in Section 3.2 (only one observation available). This why we do not fit the parameter via equation (6.2), but we put
The standard deviations of from Table 4 were obtained by using the rq function integrated in free R software. Note that the values of given by this function are slightly different from those presented in Section 3.2. This is probably due to the optimization algorithm that is used in R. Given that these differences are insignificant, we decided to present in Table 4 the same numerical values of the estimators as given in Section 3.2. The corresponding R code is available by request from the author.
As mentioned in the algorithm and in Appendix E, the solutions of (6.1) and (6.2) are not always available. For instance, in our example, there is no solution of equation (6.1) for and no solution of equation (6.2) for This means that for and the parameters and need to be specified in a different way. This could be done using any other approach that is being judged appropriate by the actuary performing estimation. In our case, we put to have from one hand the optimal properties (see Proposition 4.1) but also to be consistent with our choice of in our two previous numerical applications (see Sections 3.5 and 6.1).
The estimation of parameters and are stated in Table 5. The values of and for which we arbitrarily put 0 are indicated with bold font characters.
The MSEP and claims reserves amount estimators are stated in Table 6. Observe that the robust estimation is a good compromise between the method with the same weights (see Section 3.5) and the method where we use all link ratios in estimation (see proxy method in Section 6.1).
6.3. Validation of results from reserving softwares
The next interesting and extremely important application of our GMCL model is the possibility of validating the results from industry reserving software. The stochastic chain-ladder type methods are used to evaluate the economic risk capital required by Solvency II for so-called reserve risk. In fact, this capital requirement for reserve risk is computed as the 99.5 th percentile (value at risk) of run-off result distribution (profit/loss on reserves over one year). This means that Solvency II defines the reserve risk in one-year time horizon, which is different from the standard approach considering the distribution of the ultimate cost of claims.
However, one of the methods to derive the one-year reserve risk is based on simple scaling of ultimate view. This technique is based on using the results of Merz and Wüthrich (2008), which is currently a popular methodology throughout the market and taken from the latest technical literature on this topic.
The empirical loss distribution in ultimate view is often derived by using the bootstrap techniques and Monte Carlo simulations. The first technique is used to evaluate the estimation error and the second to approximate the process variance. This kind of bootstrap approach is also available in ResQ software, which is used worldwide within the property and casualty insurance market. The question is how to validate the results from bootstrap method provided by reserving tools such as ResQ. One of the possible solutions is to compare the estimation of the first two moments of loss distribution from bootstrapping (based on simulations) with the estimators of reserves and MSEP of reserves obtained by the explicit formulas. For the sake of simplicity, we assume that there is no factors selection (all weights are fixed to 1 ). We use the RAA run-off triangle and we present the numerical results in Table 7. For all bootstrapping results we used 100,000 simulations. We begin our analysis with the classic chain-ladder method in which the estimators of are the all volume weighted average and are consistent with Mack (1993). More precisely, with the hypothesis of the MCL method with we compare the estimate of MSEP obtained by these two techniques: bootstrap from ResQ and explicit formula given in MCL approach. The corresponding numerical values are respectively: 27150 (see ResQ(Boot) (3) in Table 7) and 26909 (see ResQ(MCL) (4) in Table 7). We observe a good convergence for bootstrap (the relative error is less than ). We consider now the different estimator of computed as a simple arithmetic average of individual link ratios This is equivalent to taking in the MCL framework. In that case, we observe that the estimates of MSEP for both methods become divergent: 75656 (see ResQ(Mack) (2) in Table 7) and 58475 (see ResQ(Boot) (1) in Table 7). This is due to the fact that the (Mack) method is obtained by approximation based on the MCL formula with In fact, according to the technical documentation, the ResQ estimates of parameters and in the bootstrap approach are of the form (up to multiplicative constant for bias reduction): and It is easily seen that these estimators are consistent with our general approach with and (see (4.2) and (4.4) in Section 4). The MSEP estimator is equal to 59065 (see GMCL (5) in Table 7). This shows that GMCL method allows one to validate the results and detect the incoherences. Effectively, the choice of estimators in ResQ for the case is not optimal in sense of Proposition 4.1. It remains unknown whether this is deliberate or whether this is just a proxy approach that was judged correct.
In regards to the approximation this shows that in construction of the proxy methods we cannot just take the MSEP formula for as a starting point. Indeed, the MSEP formula changes if we modify the estimates of because the variance of is not the same, so it is not enough to plug in the new estimators of and in the MSEP formula (5.4) with This lack of understanding of this principle could be a reason of taking the no optimal hypothesis in bootstrap ResQ(Boot) (1) method.
Finally, we observe that the results of (Boot) (1) method validate our explicit formula for estimation of MSEP of claims reserves in the framework of our GMCL model.
7. Conclusion
In this paper we presented a general flexible tool for stochastic loss reserving and its variability. We developed our GMCL model to quantify the variability of reserves in the context of selecting development factors in the framework of the stochastic chain-ladder method.
We provided the theoretical and flexible background which covers some practices of actuaries and industrial providers of reserving softwares.
Finally, we showed the way of bridging the chain-ladder model and the robust estimation techniques. Our results can be applied in other approaches based on chain-ladder framework like: multivariate chainladder, univariate and multivariate Bayesian chainladder, etc. One can derive the similar results in the context of one-year reserve risk for Solvency II purposes. This topic will be treated in our forthcoming paper. Some partial results can be found in Sloma (2014) and Sloma (2011).
Acknowledgments
The author thanks the reviewers for their helpful comments and suggestions.
Appendix: Mathematical Proofs
We present here the proofs of our main results. Most of them are derived by simple rewriting the techniques applied in Mack (1993), Mack (1994).
A.1. Proof of Result 5.1
Due to the general rule for any scalar c we have
\begin{aligned} \operatorname{msep}_{\hat{C}_{i l} \mid D_{l}}\left(C_{i l}\right) & =E\left[\left(\hat{C}_{i l}-C_{i l}\right)^{2} \mid D_{I}\right] \\ & =\operatorname{Var}\left(C_{i l} \mid D_{I}\right)+\left(E\left(C_{i l} \mid D_{I}\right)-\hat{C}_{i l}\right)^{2} \end{aligned}\tag{A.1}
To estimate  we use the following
Lemma 9.1. For  we have,
\operatorname{Var}\left(C_{i, I} \mid D_{I}\right)=\sum_{l=I+1-i}^{I-1} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right] \sigma_{l}^{2} \prod_{k=l+1}^{I-1} f_{k}^{2} .\tag{A.2}
The proof of Lemma A. 1 is provided in Appendix A.5.
Note that the estimation of from equation (A.2) is a crucial part of this proof. We choose to estimate this term by This is due to the obvious observation that is an unbiased estimate of and from the basic property of conditional expectation, namely:
It is worth noting here that, in the case where  with  in Saito (2009), the author used the same technique of estimation without giving any justification or reason for that (see proof of Lemma 4 and Estimate 8). Similarly, in the case where  with  and  we find the same estimator in Mack (1999). More precisely, the author claims (without proving) that
 can be estimated via the quantity  where  is an estimate of  Indeed, this is achieved if we estimate  by 
However, in Murphy, Bardis, and Majidi (2012), the authors used different approach based on normal approximation.
Note that in the Section 6.3 we obtained that the above estimator of is consistent with that provided by bootstrap technique from ResQ software. This is shown for the particular run-off triangle and the assumption that are gamma-distributed random variables. It would be interesting to perform the extensive simulation study in order to examine the exactitude of this estimate with other data and probability distributions.
We apply now Lemma A. 1 with and by replacing the unknown parameters et with their estimators and Together with the equality (see Proposition 4.1 (v)) we conclude
\begin{aligned} \operatorname{Var}\left(C_{i, l} \mid D_{I}\right) & =\sum_{l=I+1-i}^{I-1} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right] \sigma_{l}^{2} \prod_{k=l+1}^{I-1} f_{k}^{2} \\ & =\sum_{l=I+1-i}^{I-1} \frac{\hat{C}_{i, l}^{2}}{\hat{\delta}_{i, l}} \hat{\sigma}_{l}^{2} \prod_{k=l+1}^{I-1} \hat{f}_{k}^{2} \\ & =\sum_{l=I+1-i}^{I-1} \frac{\hat{\sigma}_{l}^{2}}{\hat{\delta}_{i, l}} C_{i, l+1-i}^{2} \prod_{k=I+1-i}^{l-1} \hat{f}_{k}^{2} \cdot \prod_{k=l+1}^{I-1} \hat{f}_{k}^{2} \\ & =C_{i, l+1-i}^{2} \sum_{l=I+1-i}^{I-1} \frac{\hat{\sigma}_{l}^{2} / \hat{f}_{l}^{2}}{\hat{\delta}_{i, l}} \prod_{k=l+1-i}^{I-1} \hat{f}_{k}^{2} \\ & =C_{i, l+1-i}^{2} \cdot \prod_{k=I+1-i}^{I-1} \hat{f}_{k}^{2} \sum_{l=l+1-i}^{I-1} \frac{\hat{\sigma}_{l}^{2} / \hat{f}_{l}^{2}}{\hat{\delta}_{i, l}} \\ & =\hat{C}_{i, l}^{2} \sum_{l=I+1-i}^{I-1} \frac{\hat{\sigma}_{l}^{2} / \hat{f}_{l}^{2}}{\hat{\delta}_{i, l}} . \end{aligned}\tag{A.3}
We now turn to the second summand of the expression (A.1). Because of Proposition 4.1 (iv) and (v) we have,
\begin{aligned} & \left(E\left(C_{i, l} \mid D_{I}\right)-\hat{C}_{i, I}\right)^{2} \\ & \quad=C_{i, l+1-i}^{2}\left(f_{I+1-i} \cdot \ldots \cdot f_{I-1}-\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1}\right)^{2} . \end{aligned}\tag{A.4}
As can be easily seen, this expression cannot be estimated by replacing with In order to estimate the right hand side of (A.4) we use the same approach as in Mack (1993), Mack (1994). Saito (2009) followed the same technique of estimation. However, in Murphy, Bardis, and Majidi (2012) we can find a different approach which was also presented in Buchwalder et al. (2006a). It is worth noting that in the paper of Mack, Quarg, and Braun (2006) the authors criticised the approaches of Buchwalder et al. (2006a) and showed that the estimate of estimation error from Mack (1993) is hard to be improved (see also Buchwalder et al. 2006b). As the answer for the criticism of Mack on article of Buchwalder et al. (2006a), the authors provided the bounds for estimation error and claimed that the Mack estimator, in some particular cases, is closed to these bounds (see Wüthrich, Merz, and Bühlmann 2008). This should be confirmed by performing the extensive simulation study to quantify the different approaches of error estimation in stochastic chain-ladder framework.
We define,
\begin{aligned} F & =f_{l+1-i} \cdot \ldots \cdot f_{I-1}-\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1} \\ & =S_{I+1-i}+\ldots+S_{I-1}, \end{aligned}\tag{A.5}
with
\begin{aligned} S_{k}= & \hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{k-1} f_{k} f_{k+1} \cdot \ldots \cdot f_{I-1} \\ & -\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{k-1} \hat{f}_{k} f_{k+1} \cdot \ldots \cdot f_{I-1} \\ = & \hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{k-1}\left(f_{k}-\hat{f}_{k}\right) f_{k+1} \cdot \ldots \cdot f_{I-1} . \end{aligned}\tag{A.6}
This yields
\begin{aligned} F^{2} & =\left(S_{I+1-i}+\ldots+S_{I-1}\right)^{2} \\ & =\sum_{k=I+1-i}^{I-1} S_{k}^{2}+2 \sum_{k=I+1-i}^{I-1} \sum_{j<k}^{I-1} S_{j} S_{k} . \end{aligned}\tag{A.7}
We estimate using the following
Proposition A. 1 (Estimate of ) Let define, for the set of observed up to development year namely
B_{k}=\left\{C_{i, j}: i+j \leq I+1, k \leq j\right\} \subset D_{I} .
Then, we can estimate by
\widehat{F^{2}}=\prod_{l=I+1-i}^{I-1} \hat{f}_{l}^{2} \sum_{k=I+1-i}^{I-1} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}}.
The proof of Proposition A. 1 is provided in Appendix A.5.
It remains to determine the estimate of We use the following
Proposition A. 2 We assume (4.3). We have
\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)=\sigma_{k}^{2} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}} .\tag{A.8}
The proof of Proposition A. 2 is provided in Appendix A.5.
Finally, using (A.4) and Proposition A. 2 we estimate by
\begin{aligned} & C_{i, I+1-i}^{2} \hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{I-1}^{2} \sum_{k=I+1-i}^{I-1} \frac{\hat{\boldsymbol{\sigma}}_{k}^{2}}{\hat{f}_{k}^{2}} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}} \\ & \quad=\hat{C}_{i, I}^{2} \sum_{k=I+1-i}^{I-1} \frac{\hat{\sigma}_{k}^{2}}{\hat{f}_{k}^{2}} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}. \end{aligned}
This completes the proof of Result 5.1.
A.2. Proof of result 5.2 (overall standard error)
Following the definition in (2.4), we have
\begin{aligned} & \operatorname{msep}_{\sum_{i=1}^{I} \hat{C}_{i, I} \mid D_{I}}\left(\sum_{i=1}^{I} C_{i, I}\right) \\ & =E\left[\left(\sum_{i=1}^{I} \hat{C}_{i I}-\sum_{i=1}^{I} C_{i I}\right)^{2} \mid D_{I}\right] \\ & \quad=\operatorname{Var}\left(\sum_{i=1}^{I} C_{i, I} \mid D_{I}\right)+\left(E\left(\sum_{i=1}^{I} C_{i, I} \mid D_{I}\right)-\sum_{i=1}^{I} \hat{C}_{i, I}\right)^{2} \end{aligned}
The independence of accident years yields
\operatorname{Var}\left(\sum_{i=1}^{I} C_{i, I} \mid D_{I}\right)=\sum_{i=1}^{I} \operatorname{Var}\left(C_{i, I} \mid D_{I}\right) .
where each term of the sum has already been calculated in the proof of the Result 5.1.
Furthermore
\begin{aligned} & \left(E\left(\sum_{i=1}^{I} C_{i, I} \mid D_{I}\right)-\sum_{i=1}^{I} \hat{C}_{i, I}\right)^{2} \\ & \quad=\left(\sum_{i=1}^{I}\left(E\left(C_{i, I} \mid D_{I}\right)-\hat{C}_{i, I}\right)\right)^{2} \\ & \quad=\sum_{i, j}^{I}\left(E\left(C_{i, I} \mid D_{I}\right)-\hat{C}_{i, I}\right)\left(E\left(C_{j, I} \mid D_{I}\right)-\hat{C}_{j, I}\right) \end{aligned}
Taking together
\begin{aligned} & \widehat{\operatorname{msep}} \sum_{i=1}^{I} \hat{c}_{i l} \mid D_{l} \\ &\left(\sum_{i=1}^{I} C_{i, I}\right)= \sum_{i=2}^{I} \widehat{\operatorname{msep}}_{\hat{C}_{i l} \mid D_{l}}\left(C_{i I}\right) \\ &+\sum_{2 \leq i \leq j \leq I}^{I} 2 \cdot C_{i, I+1-i} C_{j, I+1-j} F_{i} F_{j}, \end{aligned}
with
F_{i}=f_{I+1-i} \cdot \ldots \cdot f_{I-1}-\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1}=\sum_{k=I+1-i}^{I-1} S_{k}^{i},
where
S_{k}^{i}=\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{k-1}\left(f_{k}-\hat{f}_{k}\right) f_{k+1} \cdot \ldots \cdot f_{I-1}.
We can determine the estimator of in the analogous way as for
Proposition A.3. We have
\begin{aligned} \widehat{F_{i} F_{j}}= & \sum_{k=I+1-i}^{I-1} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}}\left(\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1}\right) \\ & \cdot\left(\hat{f}_{I+1-j} \cdot \ldots \cdot \hat{f}_{I-1}\right) . \end{aligned}
We finally conclude, from Proposition A. 3
\begin{aligned} & \sum_{2 \leq i<j \leq I}^{I} 2 \cdot C_{i, I+1-i} C_{j, I+1-j} \widehat{F_{i} F_{j}} \\ & =\sum_{i=2}^{I} \hat{C}_{i, I} \sum_{j=i+1}^{I} \hat{C}_{j, I} \sum_{k=I-i+1}^{I-1} 2 \frac{\hat{\sigma}_{k}^{2} / \hat{f}_{k}^{2}}{\left(\sum_{l=1}^{I-k} \gamma_{l, k}\right)^{2}} \cdot \sum_{l=1}^{I-k} \frac{\gamma_{l, k}^{2} \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\delta_{l, k}} . \end{aligned}
A.3. Proof of Proposition 3.1
i. See Theorem 2 p. 215 in Mack (1993).
ii. See discussion on p. 112, Corollary on p. 141 and Appendix B on p. 140 in Mack (1994).
iii. See Appendix E on p. 151 in Mack (1994).
iv. See Theorem 1 p. 215 and discussion after the proof of Theorem 2 on page 216 in Mack (1993).
v. see Appendix C p. 142 in Mack (1994):
A.4. Proof of Proposition 4.1
(i), (iv) and (v), see proofs of (i), (iv) and (v) respectively in Proposition 3.1.
(ii). The first part of the statement regarding to the minimal variance of parameters  can be easily derived from the proof of (ii) in Proposition 3.1. The rest of the proof is easily seen from the Proposition A.2.
(iii). Without loss of generality and to avoid the complexity of notation we present the proof for  (for each  all weights  are different from 0 ).
We have, for
\begin{aligned} & (I-k-1) \cdot \hat{\sigma}_{k}^{2}=\sum_{i=1}^{I-k} \delta_{i, k}\left(F_{i, k}-\hat{f}_{k}\right)^{2} \\ & \quad=\sum_{i=1}^{I-k} \delta_{i, k} F_{i, k}^{2}-2 \sum_{i=1}^{I-k} \delta_{i, k} F_{i, k} \cdot \hat{f}_{k}+\sum_{i=1}^{I-k} \delta_{i, k} \hat{f}_{k}^{2} . \end{aligned}
Since are measurable, we have
\begin{aligned} E\left((I-k-1) \cdot \hat{\sigma}_{k}^{2} \mid B_{k}\right)= & \sum_{i=1}^{I-k} \delta_{i, k} E\left(F_{i, k}^{2} \mid B_{k}\right) \\ & -2 \sum_{i=1}^{I-k} \delta_{i, k} E\left(F_{i, k} \cdot \hat{f}_{k} \mid B_{k}\right) \\ & +\sum_{i=1}^{I-k} \delta_{i, k} E\left(\hat{f}_{k}^{2} \mid B_{k}\right). \end{aligned}
In the following derivation we use measurability of and definition of from (4.2). Furthermore, the assumption GMCL. 3 implies that and are independent for From assumption GMCL. 1 and GMCL. 2 we easily see that Taking together,
\small{ \begin{aligned} E\left(F_{i, k} \cdot \hat{f}_{k} \mid B_{k}\right)= & \frac{1}{\sum_{l=1}^{I-k} \gamma_{l, k}}\left(\sum_{j=1}^{I-k} \gamma_{j, k} \cdot E\left(F_{i, k} \cdot F_{j, k} \mid B_{k}\right)\right) \\ & =\frac{1}{\sum_{l=1}^{I-k} \gamma_{l, k}}\left(\begin{array}{l} \left.\gamma_{i, k} \cdot E\left(F_{i, k}^{2} \mid B_{k}\right)+\sum_{j \neq i}^{I-k} \gamma_{j, k}\right) \\ \\ = \\ \sum_{l=1}^{I-k} \gamma_{l, k} \\ = \\ \\ \\ \left.\sum_{l=1}^{I-k} \gamma_{l, k} \left\lvert\, F_{i, k} \cdot\left(\frac{\sigma_{k}^{2}}{\delta_{i, k}}+f_{k}^{2}\right)+\sum_{j \neq i}^{I-k} \gamma_{j, k} f_{k}^{2}\right.\right) \\ = \\ \left.\gamma_{i, k} \cdot \frac{\sigma_{k}^{2}}{\delta_{i, k}}+f_{k}^{2} \sum_{j=i}^{I-k} \gamma_{j, k}^{2}\right) \\ \frac{\sum_{i, k}}{I-k} \gamma_{l, k} \end{array} f_{k}^{2} .\right. \end{aligned}\tag{A.9}}
From Proposition A. 2
\begin{aligned} E\left(\hat{f}_{k}^{2} \mid B_{k}\right) & =\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)+\left(E\left(\hat{f}_{k} \mid B_{k}\right)\right)^{2} \\ & =\sigma_{k}^{2} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}+f_{k}^{2}} . \end{aligned}
Taking together we have
\begin{aligned} & E\left((I-k-1) \cdot \hat{\sigma}_{k}^{2} \mid B_{k}\right) \\ & =\sum_{i=1}^{I-k} \delta_{i, k} E\left(F_{i, k}^{2} \mid B_{k}\right)-2 \sum_{i=1}^{I-k} \delta_{i, k} E\left(F_{i, k} \cdot \hat{f}_{k} \mid B_{k}\right) \\ &\quad +\sum_{i=1}^{I-k} \delta_{i, k} E\left(\hat{f}_{k}^{2} \mid B_{k}\right) \\ & =\sum_{i=1}^{I-k} \delta_{i, k}\left(\frac{\sigma_{k}^{2}}{\delta_{i, k}}+f_{k}^{2}\right)-2 \sum_{i=1}^{I-k} \delta_{i, k}\left(\sigma_{k}^{2} \frac{\frac{\gamma_{i, k}}{\delta_{i, k}}}{\sum_{i=1}^{I-k} \gamma_{i, k}}+f_{k}^{2}\right) \\ &\quad +\sum_{i=1}^{I-k} \delta_{i, k}\left(\sigma_{k}^{2} \frac{\sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}+f_{k}^{2}\right) \\ & =(I-k) \sigma_{k}^{2}+f_{k}^{2} \sum_{i=1}^{I-k} \delta_{i, k}-2 \sigma_{k}^{2}-2 f_{k}^{2} \sum_{i=1}^{I-k} \delta_{i, k} \\ &\quad +\sigma_{k}^{2}-\frac{\sum_{i=1}^{I-k} \delta_{i, k} \sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}+f_{k}^{2} \sum_{i=1}^{I-k} \delta_{i, k} \\ & =(I-k-1) \sigma_{k}^{2}+\sigma_{k}^{2}\left[\frac{\sum_{i=1}^{I-k} \delta_{i, k} \sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}-1\right]. \end{aligned}\tag{A.10}
Finally
\begin{aligned} & E\left(\hat{\sigma}_{k}^{2}-\sigma_{k}^{2}\right)=E\left[E\left[\left(\hat{\sigma}_{k}^{2}-\sigma_{k}^{2}\right) \mid B_{k}\right]\right] \\ & \quad=\frac{\sigma_{k}^{2}}{I-k-1} E\left[\frac{\sum_{i=1}^{I-k} \delta_{i, k} \sum_{j=1}^{I-k} \frac{\gamma_{j, k}^{2}}{\delta_{j, k}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}-1\right] . \end{aligned}
A.5. Proofs of auxiliary results
A.5.1. Proof of Lemma 9.1
Let define, for and the set of observed data for accident year and up to development year namely
A_{i, j}=\left\{C_{i, k}: 1 \leq k \leq j\right\} .
For
\begin{aligned} \operatorname{Var}\left(C_{i, l} \mid D_{I}\right)= & \operatorname{Var}\left(C_{i, l+1} \mid A_{i, I+1-i}\right) \\ & E\left[\operatorname{Var}\left(C_{i, l+1} \mid A_{i, l-1}\right) \mid A_{i, I+1-i}\right] \\ & +\operatorname{Var}\left[E\left(C_{i, l+1} \mid A_{i, I-1}\right) \mid A_{i, I+1-i}\right] \\ = & E\left[\left.\sigma_{l}^{2} \frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, A_{i, I+1-i}\right]+\operatorname{Var}\left[f_{l} C_{i, l} \mid A_{i, I+1-i}\right] \\ = & \sigma_{l}^{2} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, A_{i, I+1-i}\right]+f_{l}^{2} \operatorname{Var}\left[C_{i, l} \mid A_{i, I+1-i}\right] \\ = & \sigma_{l}^{2} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right]+f_{l}^{2} \operatorname{Var}\left[C_{i, l} \mid D_{I}\right]. \end{aligned}\tag{A.11}
We multiply the both sides by with the convention that an empty product equals 1 . Taking the sum over we obtain
\begin{aligned} & \sum_{l=I+1-i}^{I-1} \operatorname{Var}\left(C_{i, l+1} \mid D_{I}\right) \prod_{k=l+1}^{I-1} f_{k}^{2} \\ & =\sum_{l=I+1-i}^{I-1} \sigma_{l}^{2} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right] \prod_{k=l+1}^{I-1} f_{k}^{2} \\ & \quad+\sum_{l=I+1-i}^{I-1} \operatorname{Var}\left[C_{i, l} \mid D_{I}\right] f_{l}^{2} \prod_{k=l}^{I-1} f_{k}^{2} \operatorname{Var}\left(C_{i, I} \mid D_{I}\right) \\ & \quad+\sum_{l=I+1-i}^{I-2} \operatorname{Var}\left(C_{i, l+1} \mid D_{I}\right) \prod_{k=I+1}^{I-1} f_{k}^{2} \\ & =\sum_{l=I+1-i}^{I-1} \sigma_{l}^{2} E\left[\left.\frac{C_{i, l}^{2}}{\delta_{i, l}} \right\rvert\, D_{I}\right] \prod_{k=l+1}^{I-1} f_{k}^{2} \\ & \quad+\operatorname{Var}\left[C_{i, I+1-i} \mid D_{I}\right] \prod_{k=I+1-i}^{I-1} f_{k}^{2} \\ & \quad+\sum_{l=I+2-i}^{I-1} \operatorname{Var}\left[C_{i, l} \mid D_{I}\right] \prod_{k=l}^{I-1} f_{k}^{2} . \end{aligned}\tag{A.12}
Since and from the fact that
\sum_{l=I+1-i}^{I-2} \operatorname{Var}\left(C_{i, l+1} \mid D_{I}\right) \prod_{k=l+1}^{I-1} f_{k}^{2}=\sum_{l=I+2-i}^{I-1} \operatorname{Var}\left[C_{i, l} \mid D_{I}\right] \prod_{k=l}^{I-1} f_{k}^{2},
we finally get the proof of Lemma A. 1.
A.5.2. Proof of Proposition A. 1
Following Mack (1993), Mack (1994), we replace with and with This means that we approximate and by varying and averaging as little data as possible so that as many values from data observed are kept fixed. Due to Proposition 4.1 (i) we have and therefore for because all are scalars under Since we obtain from (A.6)
E\left(S_{k}^{2} \mid B_{k}\right)=\hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-i}^{2} \operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right) f_{k+1}^{2} \cdot \ldots \cdot f_{I-1}^{2} .
Taken together, we have replaced with and the unknown parameters are replaced by their estimators. Altogether, we estimate by
\sum_{k=I+1-i}^{I-1} \hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-1}^{2} \hat{f}_{k}^{2} \hat{f}_{k+1}^{2} \cdot \ldots \cdot \hat{f}_{I-1}^{2} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}}.
A.5.3. Proof of Proposition A. 2
From definition of in (14), we have
\begin{aligned} \operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right) & =\operatorname{Var}\left(\left.\frac{\sum_{j=1}^{I-k} \gamma_{j, k} F_{j, k}}{\sum_{j=1}^{I-k} \gamma_{j, k}} \right\rvert\, B_{k}\right) \\ & =\frac{\sum_{j=1}^{I-k} \gamma_{j, k}^{2} \operatorname{Var}\left(F_{i, k} \mid B_{k}\right) \cdot \mathbf{1}_{\left\{\delta_{j, k} \neq 0\right\}}}{\left(\sum_{j=1}^{I-k} \gamma_{j, k}\right)^{2}}, \end{aligned}
where the second equality is due to the -measurability of the assumption (4.3) and the convention that the product of 0 and equals to 0 .
A.5.4. Proof of Proposition A. 3
We find the estimator in the similar way to the estimator (see proof of Proposition A.1).
E\left[\left(S_{k}^{i}\right)^{2} \mid B_{k}\right]=\hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-i}^{2} \operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right) f_{k+1}^{2} \cdot \ldots \cdot f_{I-1}^{2}.
For we have
\begin{aligned} \widehat{F_{i} F_{j}=} & \sum_{k=I+1-i}^{I-1} \hat{f}_{I+1-j} \cdot \ldots \cdot \hat{f}_{I-i} \\ & \cdot \hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-i}^{2} \operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right) \hat{f}_{k+1}^{2} \cdot \ldots \cdot \hat{f}_{I-1}^{2} \\ = & \sum_{k=I+1-i}^{I-1} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}} \hat{f}_{I+1-j} \cdot \ldots \cdot \hat{f}_{I-i} \\ & \cdot \hat{f}_{I+1-i}^{2} \cdot \ldots \cdot \hat{f}_{k-1}^{2} \cdot \hat{f}_{k}^{2} \cdot \hat{f}_{k+1}^{2} \cdot \ldots \cdot \hat{f}_{I-1}^{2} \\ = & \sum_{k=I+1-i}^{I-1} \frac{\operatorname{Var}\left(\hat{f}_{k} \mid B_{k}\right)}{\hat{f}_{k}^{2}}\left(\hat{f}_{I+1-j} \cdot \ldots \cdot \hat{f}_{I-1}\right) \\ & \cdot\left(\hat{f}_{I+1-i} \cdot \ldots \cdot \hat{f}_{I-1}\right) . \end{aligned}. \tag{A.13}
B. Data
We present in Table B.8 the triangle of RAA data analysed in Mack (1994) and Murphy, Bardis, and Majidi (2012).
C. Individual link ratios of RAA run-off triangle
D. LAD estimator
The least absolute deviation (LAD) method or (also known as Least Absolute Value (LAV)) method is a widely known alternative to the classical least squares (LS) or method for statistical analysis of linear regression models. Instead of minimizing the sum of squared errors, it minimizes the sum of absolute values of errors. More precisely, in the context of linear regression model, estimates are found by solving the following optimisation problem
\min _{\beta}\left\{\sum_{i=1}^{n}\left|e_{i}\right|\right\}=\min _{\beta}\left\{\sum_{i=1}^{n}\left|y_{i}-\sum_{j}^{m} x_{i j} \beta_{j}\right|\right\},
where and Unlike the LS method, the LAD method is not sensitive to outliers and produces robust estimates. LAD method is reduced to a linear programming problem and the computational difficulty is now entirely overcome by the availability of computing power and the effectiveness of linear programming.
Least absolute values (LAV) regression is very resistant to observations with unusual values in data.
In the numerical example presented in Section 6.2.2, we used one-dimensional ( ) LAD procedure where we took, for each column of run-of triangle, and for all
One more thing merits mentioning here. In the simple one-dimensional case the LAD estimator yields to the sample median (see Abur and Expósito 2004, 141).
E. Robust estimation-Limits in estimation of parameters
We want to examine the existence of solution of equation (6.2). In this purpose, we study the properties of the flowing type of functions,
h_{k}\left(\beta_{k}\right):=\left(\sum_{i=1}^{N_{k}} a_{i, k}^{-\beta} b_{i, k}\right)\left(\sum_{i=1}^{N_{k}} a_{i, k}^{\beta} c_{i, k}\right),
with and In the sequel, without the loss of generality, we omit the index corresponding to the column of run-off triangle. Thus, we consider the function
h(\beta):=\left(\sum_{i=1}^{N} a_{i}^{-\beta} b_{i}\right)\left(\sum_{i=1}^{N} a_{i}^{\beta} c_{i}\right),
with and We rewrite the function as follows:
h(\beta):=\sum_{i=1}^{N} \sum_{j=1}^{N}\left(a_{i}^{-\beta} b_{i}\right)\left(a_{j}^{\beta} c_{j}\right) .
We easily observe that the function tends to as tends to or Let us define, for and where the indices are such that Then, by simple decomposition of double sum, we get:
h(\beta):=\sum_{i=1}^{N} b_{i} c_{i}+\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{i} c_{j} \cdot d_{i, j}^{\beta}+\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{j} c_{i} \cdot d_{j, i}^{\beta},
where by our notation and By the straightforward computations it is easy to show that
\begin{aligned} h^{\prime}(\beta):= & \sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{i} c_{j} \cdot \ln \left(d_{i, j}\right) \cdot d_{i, j}^{\beta} \\ & +\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{j} c_{i} \cdot \ln \left(d_{j, i}\right) \cdot d_{j, i}^{\beta}, \end{aligned}
Since and the first derivative has a limit in and if tends to and respectively. In addition, the second derivative is given by
\begin{aligned} h^{\prime \prime}(\beta): & =\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{i} c_{j} \cdot\left(\ln \left(d_{i, j}\right)\right)^{2} \cdot d_{i, j}^{\beta} \\ & +\sum_{i=1}^{N} \sum_{j=i+1}^{N} b_{j} c_{i} \cdot\left(\ln \left(d_{j, i}\right)\right)^{2} \cdot d_{j, i}^{\beta} . \end{aligned}
Given that and are strictly positive functions and all coefficients are positive, the second derivative of is strictly positive. This means that first derivative of is increasing function. Together with the previous facts it implies that has an absolute minimum. In consequence, the equation (23) has zero, one or two solutions.
In the case where two solutions of opposite sign exist, the actuary should decide which one corresponds better to the considered line of business. In fact, as mentioned in Murphy, Bardis, and Majidi (2012), the choice of negative solution does not seem to be unreasonable in some situations. This issue is out of scope of this paper. In our numerical example the solution is determined by the Excel tool called solver.
_.png)
_.png)