1. Introduction
The chain ladder is a widely used algorithm for loss reserving. It is formulated in Mack (1993). From its heuristic beginnings, it was shown to give maximum likelihood (ML) estimates of model parameters (Hachemeister and Stanard 1975; Mack 1991a; Renshaw and Verrall 1998) when:
-
observations are independently Poisson distributed; and
-
their means are modeled as the product of a row effect and a column effect.
This result was extended from the Poisson to the overdispersed Poisson (ODP) distribution by England and Verrall (2002).
Mack (1991a) considered another model in which observations were gamma distributed, and gave a number of earlier references to the same model. ML parameter estimates were obtained which, while not identical to chain ladder estimates, have sometimes been found by subsequent authors (e.g., Wüthrich 2003) to be numerically similar.
The ODP lies within the Tweedie family (Tweedie 1984), a subset of the exponential dispersion family (Nelder and Wedderburn 1972). Wüthrich (2003) made a numerical study of ML fitting in the case of Tweedie distributed observations. Again the results were similar to chain ladder estimation.
The purpose of the present very brief note is to consider ML estimation in this Tweedie case, to derive the earlier results as special cases of it, and to indicate the reasons for the numerical similarity of their results.
2. Preliminaries
2.1. Framework and notation
The data set will consist throughout of a triangle of insurance claims data. Let i = 1, 2, . . . , n denote period of origin, j = 1, 2, . . . , n denote development period, and Yij ≥ 0 the observation in the (i, j) cell of the triangle. The triangle consists of the set {Yij : i = 1, 2, . . . , n; j = 1, 2, . . . , n − i + 1} of incremental claims data (paid losses, claim counts, etc.). It is assumed that E[Yij] is finite for each (i, j).
Define cumulative row sums
\[ S_{i j}=\sum_{k=1}^{j} Y_{i k} . \tag{2.1} \]
Further, let denote summation over the entire row of the triangle of quantities indexed by i.e., over cells with fixed and Similarly, let denote summation over the entire column of the triangle, and let denote summation over the entire diagonal
2.2. Chain ladder
The chain ladder model is formulated by Mack (1991b, 1993) as follows:
\[ \begin{array}{l} \mathrm{E}\left[S_{i, j+1} \mid S_{i 1}, S_{i 2}, \ldots, S_{i j}\right]=S_{i j} f_{j},\\ j=1,2, \ldots, n-1 \text {, independently of } i \end{array} \tag{2.2} \]
for some set of parameters fj; and also
Rows of the data triangle are stochastically independent, i.e., Yij and Ykl are independent for i ≠ k.
It may be observed that (2.2) implies
\[ \mathrm{E}\left[S_{i j} \mid S_{i 1}\right]=S_{i 1} f_{1} f_{2} \ldots f_{j-1} ,\tag{2.3} \]
which in turn implies
\[ \mathrm{E}\left[Y_{i j}\right]=\alpha_{i} \beta_{j} \tag{2.4} \]
for parameters αi, βj, where E[Yij] denotes the unconditional mean of Yij, and
\[ f_{j}=\sum_{k=1}^{j+1} \beta_{k} / \sum_{k=1}^{j} \beta_{k} . \tag{2.5} \]
The derivation of Equation (2.4) is as follows:
\[ \begin{aligned} \mathrm{E}\left[Y_{i j}\right] & =\mathrm{E}\left[S_{i j}-S_{i, j-1} \mid S_{i 1}\right] \\ & =\mathrm{E}\left[S_{i 1} f_{1} f_{2} \ldots f_{j-2}\left(f_{j-1}-1\right)\right] \end{aligned} \]
by (2.1) and (2.3), where the outer expectation is taken with respect to Hence
\[ \mathrm{E}\left[Y_{i j}\right]=\mathrm{E}\left[S_{i 1}\right] f_{1} f_{2} \ldots f_{j-2}\left(f_{j-1}-1\right) \]
which is of form (2.4).
The chain ladder estimate of fj is
\[ F_{j}=\sum_{i=1}^{n-j} S_{i, j+1} / \sum_{i=1}^{n-j} S_{i j} . \tag{2.6} \]
The may be converted to estimates of the by means of the following relations:
\[ \hat{\beta}_{j}=\beta_{1}\left[F_{1} \ldots F_{j-2}\left(F_{j-1}-1\right)\right] \tag{2.7} \]
subject to some linear constraint on the βj, such as
\[ \sum_{k=1}^{n} \beta_{k}=1 \tag{2.8} \]
and
\[ \hat{\alpha}_{i}=S_{i, n-i+1} / \sum^{R(i)} \hat{\beta}_{j} .\tag{2.9} \]
2.3. Exponential dispersion and Tweedie families of distributions
2.3.1. Exponential dispersion family
The following family of log densities is called the exponential dispersion family (EDF) (Nelder and Wedderburn 1972):
\[ l(y ; \gamma, \lambda)=c(\lambda)[y \gamma-b(\gamma)]+a(y, \lambda) \tag{2.10} \]
for some functions a(.,.), b(.) and c(.) and parameters γ and λ.
It may be shown that, for Y subject to this log likelihood,
\[ \mu=\mathrm{E}[Y]=b^{\prime}(\gamma), \quad \operatorname{Var}[Y]=b^{\prime \prime}(\gamma) / c(\lambda) . \tag{2.11} \]
2.3.2. Tweedie family
A sub-family of the EDF is that defined by the relations:
\[ c(\lambda)=\lambda \tag{2.12} \]
\[ \operatorname{Var}[Y]=\mu^{p} / \lambda \quad \text { for some } \quad p \leq 0 \quad \text { or } \quad p \geq 1 . \tag{2.13} \]
This is the Tweedie family of exponential dispersion likelihoods (Tweedie 1984). The restriction on the moment relations (2.11) implies that
\[ b^{\prime}(\gamma)=[(1-p)(\gamma+k)]^{1 /(1-p)} \tag{2.14} \]
\[ b(\gamma)=(2-p)^{-1}[(1-p)(\gamma+k)]^{(2-p) /(1-p)} \tag{2.15} \]
for some constant k. This parameterization is found, for example, in Jorgensen and Paes de Souza (1994) and Wüthrich (2003) with k = 0.
Occasionally, the Tweedie family is defined as above but over the parameter range 1 < p < 2 (Mildenhall 1999; Kaas 2005). It is noteworthy that a member of the family with one of these values of p is a compound Poisson distribution (Jorgensen and Paes de Souza 1994) with a gamma severity distribution.
It follows from (2.11), (2.14), and (2.15) that
\[ \gamma=\mu^{1-p} /(1-p)-k \tag{2.16} \]
\[ b(\gamma)=\mu^{2-p} /(2-p) . \tag{2.17} \]
3. Maximum likelihood estimation for the Tweedie cross-classified model
Consider the model (2.4), together with the assumption that all Yij are stochastically independent. Note that this is not the same as the chain ladder model, as defined in Section 2, because the latter is formulated in terms of conditional expectations and does not make the same independence assumption. Indeed, Assumption CL1 specifically postulates dependencies between observations from within the same row.
Let Y denote the entire set {Yij} of observations, and let l(Y) denote the log likelihood of Y for some assumed distribution of the Yij, whose parameters have been suppressed for convenience. Suppose that each Yij has a Tweedie distribution defined by (2.12) and the following generalization of (2.13):
\[ \operatorname{Var}\left[Y_{i j}\right]=\mu_{i j}^{p} / \lambda w_{i j} \tag{3.1} \]
i.e., λ is replaced by λ/wij in (2.12). In common parlance wij is the weight associated with Yij. This model will be called the Tweedie cross-classified model.
While the Tweedie family allows a reasonably general representation of insurance data, its restrictions should be recognized. First, it has a short (exponential) tail for the case Second, all its cumulants, from the variance upward, are related through since the th cumulant is a multiple of where the superscript denotes differentiation (McCullagh and Nelder 1989, 44).
With the replacement λ ← λ/wij just given, and substitution of (2.16) and (2.17) into (2.10),
\[ \begin{aligned} l(Y)=\sum\{ & \lambda w_{i j}\left[y_{i j}\left[\mu_{i j}^{1-p} /(1-p)-k\right]-\mu_{i j}^{2-p} /(2-p)\right] \\ & \left.+a\left(y_{i j}, \lambda\right)\right\} \end{aligned} \tag{3.2} \]
where the summation runs over all observations in the data set Y.
The ML equations with respect to the αi are:
\[ \begin{array}{c} \delta L / \delta \alpha_{i}=\sum^{R(i)} \lambda w_{i j}\left[y_{i j} \mu_{i j}^{-p}-\mu_{i j}^{1-p}\right] \beta_{j}=0, \\ i=1, \ldots, n \end{array} \tag{3.3} \]
where use has been made of (2.4). This may be equivalently represented as follows:
Lemma 3.1 The ML equations with respect to the αi for the Tweedie cross-classified model are:
\[ \sum^{R(i)} w_{i j} \mu_{i j}^{1-p}\left[y_{i j}-\mu_{i j}\right]=0, \quad i=1, \ldots, n . \tag{3.4} \]
Similarly, the ML equations with respect to the βj are:
\[ \sum^{C(j)} w_{i j} \mu_{i j}^{1-p}\left[y_{i j}-\mu_{i j}\right]=0, \quad j=1, \ldots, n . \tag{3.5} \]
Note that p is taken here as fixed, rather than estimated. ML estimation of this parameter would require an additional equation.Equations (3.4) and (3.5) are reminiscent of the estimating equations of Fu and Wu (2007) who were concerned with a cross-classified model in a ratemaking context.
Corollary 3.2 The case of ODP is represented by The equations are then
\[ \sum^{R(i)}\left[y_{i j}-\mu_{i j}\right]=0, \quad i=1, \ldots, n \tag{3.6} \]
\[ \sum^{C(j)}\left[y_{i j}-\mu_{i j}\right]=0, \quad j=1, \ldots, n . \tag{3.7} \]
These imply the chain ladder estimation of the αi, βj set out in (2.6)–(2.9).
Proof See Hachemeister and Stanard (1975), Mack (1991a), or Renshaw and Verrall (1998).
Corollary 3.3 The case of gamma Yij is represented by p = 2. The ML equations are then
\[ \sum^{R(i)} w_{i j}\left[y_{i j} / \mu_{i j}-1\right]=0, \quad i=1, \ldots, n \tag{3.8} \]
\[ \sum^{C(j)} w_{i j}\left[y_{i j} / \mu_{i j}-1\right]=0, \quad j=1, \ldots, n . \tag{3.9} \]
Substitution of αi βj for μij, followed by minor rearrangement, gives
\[ \alpha_{i}=w_{i .}^{-1} \sum^{R(i)} w_{i j} y_{i j} / \beta_{j}, \quad i=1, \ldots, n \tag{3.10} \]
\[ \beta_{j}=w_{\cdot j}^{-1} \sum^{C(j)} w_{i j} y_{i j} / \alpha_{i}, \quad j=1, \ldots, n \tag{3.11} \]
where
\[ w_{i .}=\sum^{R(i)} w_{i j} \tag{3.12} \]
\[ w_{. j}=\sum^{C(j)} w_{i j} . \tag{3.13} \]
These are essentially the results obtained by Mack (1991a) for gamma-distributed cells.
Remark 3.4 Mack’s assumption of a gamma distribution is, in fact, an approximation to a compound Poisson distribution in each cell of the triangle in which each cell has a gamma severity distribution with the same shape parameter. Mack notes that the shape parameter would need to take a smallish value in order to attribute a non-negligible probability to Yij in the vicinity of zero.
As noted near the end of Section 2, the compound Poisson with gamma severity distribution may itself be accommodated within the Tweedie family (with 1 ≤ p < 2) and so Mack’s assumption of a gamma approximation in each cell could be replaced by the exact compound Poisson by means of suitable choice of p (< 2).
Remark 3.5 The ML equations (3.6) and (3.7) also show that the chain ladder estimates are marginal sum estimates in the ODP case (see Mack 1991a; Schmidt and Wünsche 1998). In the general Tweedie case [Equations (3.4) and (3.5)], while not equivalent to the chain ladder, they are weighted marginal sum estimates.
This provides an indication of the reason why past investigations have shown chain ladder estimates to be close to ML estimates in various Tweedie cases. For example, this was a finding of Wüthrich (2003).
To elaborate on this, write the general weighted marginal sum equation corresponding to (3.4) in the form
\[ \sum^{R(i)} \omega_{i j}\left[y_{i j}-\hat{\mu}_{i j}\right]=0 \tag{3.14} \]
where the are general weights and the term recognizes that the solution of the equations provides only an estimate of A parallel to the following argument about (3.4) may be given in relation to (3.5).
Now rewrite the left side of (3.14) as
\[ \sum^{R(i)} \omega_{i j}\left[\varepsilon_{i j}+\eta_{i j}\right] \tag{3.15} \]
where and both of which are random variables with zero means (assuming a correctly specified model).
Now consider the substitution of the solutions of (3.14) in the unweighted form of the same system of equations:
\[ \begin{aligned} \omega_{i} \sum^{R(i)} & {\left[y_{i j}-\hat{\mu}_{i j}\right] } \\ & =\omega_{i} \sum^{R(i)}\left[\varepsilon_{i j}+\eta_{i j}\right] \\ & =\sum^{R(i)} \omega_{i j}\left[\varepsilon_{i j}+\eta_{i j}\right]+\sum^{R(i)}\left(\omega_{i}-\omega_{i j}\right)\left[\varepsilon_{i j}+\eta_{i j}\right] \\ & =\sum^{R(i)}\left(\omega_{i}-\omega_{i j}\right)\left[\varepsilon_{i j}+\eta_{i j}\right] \end{aligned} \tag{3.16} \]
where
The right side of (3.16) has a mean of zero and a variance of where Hence the value of (3.16) will be small if either or both of the following conditions hold:
-
Weights vary little across a row;
-
The variances of observations around values fitted by (3.14) are small.
In this case, the solutions to (3.4) will also be approximate solutions to the unweighted form:
\[ \sum^{R(i)}\left[y_{i j}-\hat{\mu}_{i j}\right]=0 \]
which is the chain ladder solution.
In summary, under the right conditions the chain ladder will approximate the solution to the weighted marginal sum estimates given by (3.4) and (3.5).
An example of this approximation is provided by Wüthrich (2003), who made a numerical study of ML fitting of the Tweedie cross-classified model in which the parameters αi, βj, λ, and p were all treated as free and the weights wij as known. In the example, the wij varied comparatively little with i and j, and p was estimated to be 1.17.
As pointed out just prior to Remark 3.5, this parameter value is consistent with the assumption of a compound Poisson distribution for each cell of the triangle.
For this numerical example the weights show not too much variation over the triangle and the ML estimates of the Tweedie cross-classified model are expected to approximate those of the standard chain ladder, as was indeed found by Wüthrich.
4. Maximum likelihood estimation for general Tweedie
Parameters of the general Tweedie cross-classified model may be estimated by the use of GLM software. However, an interesting special case arises under the sole constraint that the weights wij also have the multiplicative structure:
\[ w_{i j}=u_{i} v_{j} . \tag{4.1} \]
Note that this includes the unweighted case wij = 1.
The ML equations for estimation of the αi, βj were derived as (3.4) and (3.5). Rewrite these with the substitutions:
\[ Z_{i j}=w_{i j} \mu_{i j}^{1-p} Y_{i j} \tag{4.2} \]
\[ \nu_{i j}=w_{i j} \mu_{i j}^{2-p}=u_{i} v_{j}\left(\alpha_{i} \beta_{j}\right)^{2-p}=a_{i} b_{j} \tag{4.3} \]
where
\[ a_{i}=u_{i} \alpha_{i}^{2-p} \tag{4.4} \]
\[ b_{j}=v_{j} \beta_{j}^{2-p} . L \tag{4.5} \]
This yields
\[ \sum^{R(i)}\left[z_{i j}-\nu_{i j}\right]=0, \quad i=1, \ldots, n \tag{4.6} \]
\[ \sum^{C(j)}\left[z_{i j}-\nu_{i j}\right]=0, \quad i=1, \ldots, n . \tag{4.7} \]
Note that these are the same equations as (3.6) and (3.7) in Corollary 3.2. Lemma 3.1 therefore implies the following result.
Lemma 4.1 Consider the Tweedie cross-classified model with general (admissible) and subject to (3.1) with constraint (4.1). ML estimates of (and hence of by (4.4) and (4.5)) are obtained by application of the chain ladder algorithm (2.6)-(2.9) to the data triangle
In the application of this result must be known in order to formulate the “data” whereas are estimands of the theorem. However, a solution can be obtained by an iterative procedure.
Let a superscript ( ) denote the th iteration of the estimate to which it is attached, e.g., Define
\[ Z_{i j}^{(r)}=w_{i j}\left[\mu_{i j}^{(r)}\right]^{1-p} Y_{i j} \tag{4.8} \]
\[ \nu_{i j}^{(r)}=w_{i j}\left[\mu_{i j}^{(r)}\right]^{2-p}=u_{i} v_{j}\left(\alpha_{i}^{(r)} \beta_{j}^{(r)}\right)^{2-p}=a_{i}^{(r)} b_{j}^{(r)} . \tag{4.9} \]
Then define as the estimates obtained in place of when the chain ladder algorithm is applied to the data triangle in place of By this iterative means, obtain the sequence of estimates initiated at by some simple choice, such as setting equal to the estimates of given by the conventional chain ladder.
If this sequence converges, then the limit is taken as an estimate of the
This procedure has been applied to the data set in the Appendix with p = 2, and convergence of the estimated loss reserve to an accuracy of 0.05% in the estimated loss reserve obtained in 5 iterations. Convergence becomes slower as p increases. For p = 2.4, 24 iterations were required to achieve an accuracy of 0.1%.
5. The “separation method”
Taylor (1977) introduced the procedure that subsequently became known as the “separation method.” This produces parameter estimates for a model of the form
\[ \mathrm{E}\left[Y_{i j}\right]=\alpha_{i+j-1} \beta_{j}, \tag{5.1} \]
which is the parallel of (2.4), but with the α parameter applying to diagonal i + j − 1 rather than row i.
The heuristic equations given by Taylor for parameter estimation were:
\[ \sum^{D(k)}\left[y_{i j}-\mu_{i j}\right]=0, \quad k=1, \ldots, n \tag{5.2} \]
\[ \sum^{C(j)}\left[y_{i j}-\mu_{i j}\right]=0, \quad j=1, \ldots, n . \tag{5.3} \]
It is evident that these equations yield marginal sum estimates. Taylor (1977) gives the explicit algorithm for generating estimates of the This will be referred to as separation method estimation, and is as follows:
\[ \alpha_{k}=\sum^{D(k)} Y_{i j} /\left[1-\sum_{j=n-k}^{n} \beta_{j}\right] \tag{5.4} \]
\[ \beta_{j}=\sum^{C(j)} Y_{i j} / \sum_{k=j}^{n} \alpha_{k}, \tag{5.5} \]
these equations being applied alternately for k = n, j = n, k = n − 1, etc.
The model resulting from replacement of (2.4) by (5.1) in the Tweedie cross-classified model will be referred to as the Tweedie separation model. It is the same as the Tweedie cross-classified model except for the interchange of rows and diagonals, and so a result parallel to each of those of Sections 3 and 4 is obtainable.
Lemma 5.1 The ML equations with respect to the αk, βj for the Tweedie separation model are:
\[ \sum^{D(k)} w_{i j} \mu_{i j}^{1-p}\left[y_{i j}-\mu_{i j}\right]=0, \quad i=1, \ldots, n \tag{5.6} \]
\[ \sum^{C(j)} w_{i j} \mu_{i j}^{1-p}\left[y_{i j}-\mu_{i j}\right]=0, \quad j=1, \ldots, n . \tag{5.7} \]
Corollary 5.2 The case of ODP is represented by The equations are then
\[ \sum^{D(k)}\left[y_{i j}-\mu_{i j}\right]=0, \quad i=1, \ldots, n \tag{5.8} \]
\[ \sum^{C(j)}\left[y_{i j}-\mu_{i j}\right]=0, \quad j=1, \ldots, n . \tag{5.9} \]
These imply the separation method estimation of the αk, βj set out in (5.4) and (5.5).
Remark 5.3 This result was known for the simple Poisson case since Verbeek (1972), actually earlier than the corresponding result for the chain ladder (Corollary 3.2).
Corollary 5.4 The case of gamma Yij is represented by p = 2. The ML equations are then
\[ \sum^{D(k)} w_{i j}\left[y_{i j} / \mu_{i j}-1\right]=0, \quad i=1, \ldots, n \tag{5.10} \]
\[ \sum^{C(j)} w_{i j}\left[y_{i j} / \mu_{i j}-1\right]=0, \quad j=1, \ldots, n . \tag{5.11} \]
Remark 5.5 In the case of the general Tweedie separation model, the separation method algorithm (5.4) and (5.5) will approximate the ML solution (5.6) and (5.7) if either or both of the following conditions hold:
-
Weights vary little over the triangle;
-
The variances of observations around values fitted by (5.6) and (5.7) are small.
Lemma 5.6 Consider the Tweedie separation model with general (admissible) p and subject to (3.1) with constraint
\[ w_{i+j-1, j}=u_{i+j-1} v_{j} . \tag{5.12} \]
Define by (4.2), and also define
\[ \begin{aligned} \nu_{i+j-1, j} & =w_{i+j-1, j} \mu_{i+j-1, j}^{2-p} \\ & =u_{i+j-1} v_{j}\left(\alpha_{i+j-1} \beta_{j}\right)^{2-p}=a_{i+j-1} b_{j} \end{aligned} \tag{5.13} \]
where
\[ a_{k}=u_{k} \alpha_{k}^{2-p} \tag{5.14} \]
\[ b_{j}=v_{j} \beta_{j}^{2-p} . \tag{5.15} \]
ML estimates of ak, bj (and hence of αk, βj) are obtained by application of the separation method algorithm (5.4) and (5.5) to the data triangle Z = {Zij}.
6. Conclusion
As noted in the statement of purpose at the end of Section 1, the purpose of this paper is largely expository. In operational terms, however, Section 4 provides a numerical procedure for obtaining parameter estimates for a Tweedie cross-classified model for known p.
This procedure will often be numerically efficient. A parallel numerical procedure produces parameter estimates for the Tweedie separation model.
A referee suggested that ML estimation might be carried out with respect to p as well as parameters αi, βj. This would extend ML estimation to the case of the Tweedie cross-classified model for unknown p.
The procedure in this case would consist of:
-
Application of a univariate numerical search procedure to maximize likelihood (3.2) with respect to p; where
-
for each trial value of p in this search, the parameters set {αi, βj} is fixed as ML for that p.
Acknowledgment
Thanks are due to Hugh Miller, who provided the numerical detail reported in Section 4. Helpful comments were also provided by referees.