1. Introduction
The actuarial literature identifies two families of chain-ladder models categorized by Verrall (2000) as recursive and non-recursive models, respectively. Although the model formulations are fundamentally different, both are found to yield the same maximum likelihood estimators of age-to-age factors and the same forecasts of loss reserve. The properties of these models are studied by Taylor (2011).
Despite the identical forecasts of the different models, their different formulations are liable to lead to different correlation structures. This means that the correlations can be regarded as providing one means of differentiating between recursive and non-recursive models. The purpose of the present paper is the investigation of these correlation structures.
There is independence between rows in all the models considered, so the correlations of greatest interest are those between future observations conditional on information up to a defined point of time, specifically
where denotes cumulative claims experience (notifications, payments, etc.) up to and including development period j in respect of accident period k.2. Framework and notation
2.1. Claims data
Consider a
rectangle off incremental claims observations with:-
accident periods represented by rows and labeled k = 1, 2, . . . , K;
-
development periods represented by columns and labeled by j = 1, 2, . . . , J K.
Within the rectangle, identify a development trapezoid of past observations
DK={Ykj:1≤k≤K and 1≤j≤min(J,K−k+1)}
The complement of this subset, representing future observations is
DcK={Ykj:1≤k≤K and min(J,K−k+1)<j≤J}={Ykj:K−J+1<k≤K and K−k+1<j≤J}.
Also let
D+K=DK∪DcK
In general, the problem is to predict ƊcK on the basis of observed ƊK.
The usual case in the literature (though often not in practice) is that in which J = K, so that the trapezoid becomes a triangle. The more general trapezoid will be retained throughout the present paper.
Define the cumulative row sums
Xkj=j∑i=1Yki
and the full row and column sums (or horizontal and vertical sums)
Hk=min(J,K−k+1)∑j=1YkjVj=K−j+1∑k=1Ykj.
Also define, for k = K J + 2, . . . , K,
Rk=J∑j=K−k+2Ykj=XkJ−Xk,K−k+1
R=K∑k=K−J+2Rk
Note that R is the sum of the (future) observations in
It will be referred to as the total amount of outstanding losses. Likewise, Rk denotes the amount of outstanding losses in respect of accident period k. The objective stated earlier is to forecast the Rk and R.Let
denote summation over the entire row k of ƊK, i.e., for fixed k.Similarly, let
denote summation over the entire column of i.e., for fixed For example, (2.2) may be expressed asVj=C(j)∑Ykj
Finally, let
denote summation over the entire trapezoid of (k, j) cells, i.e.,T∑=K∑k=1min(J,K−k+1)∑j=1=K∑k=1R(k)∑=J∑j=1K−j+1∑k=1=J∑j=1C(j)∑.
2.2. Families of distributions
2.2.1. Exponential dispersion family
The exponential dispersion family (EDF) (Nelder and Wedderburn 1972) consists of those variables Y with log-likelihoods of the form
ℓ(y,θ,ϕ)=[yθ−b(θ)]/a(ϕ)+c(y,ϕ)
for parameters
(canonical parameter) and (scale parameter) and suitable functions and with continuous, differentiable and one-one, and such as to produce a total probability mass of unity.For Y so distributed,
E[Y]=b′(θ)
Var[Y]=a(ϕ)b′′(θ)
If
denotes then (2.6) establishes a relation between and and so (2.7) may be expressed in the formVar[Y]=a(ϕ)V(μ)
for some function V, referred to as the variance function.
The notation
will be used to mean that a random variable is subject to the EDF likelihood (2.5).2.2.2. Tweedie family
The Tweedie family (Tweedie 1984) is the subfamily of the EDF for which
a(ϕ)=ϕ
V(μ)=μp,p≤0 or p≥1.
For this family,
b(θ)=(2−p)−1{[1+(1−p)θ](2−p)/(1−p)−1}
μ=[1+(1−p)θ]1/(1−p)
ℓ(y;μ,ϕ)=[y(μ1−p−1)/(1−p)−(μ2−p−1)/(2−p)]/ϕ+c(y,ϕ)
∂ℓ/∂μ=(yμ−p−μ1−p)/ϕ.
The notation
will be used to mean that a random variable is subject to the Tweedie likelihood with parameters The abbreviated form will mean that is a member of the sub-family with specific parameter2.2.3. Over-dispersed Poisson family
The over-dispersed Poisson (ODP) family is the Tweedie sub-family with p = 1. The limit of (2.12) as p → 1 gives
E[Y]=μ=expθ
By (2.8) (2.10),
Var[Y]=ϕμ
By (2.14),
∂ℓ/∂μ=(y−μ)/ϕμ.
The notation
means3. Chain-ladder models
3.1. Heuristic chain ladder
The chain ladder was originally (pre-1975) devised as a heuristic algorithm for forecasting outstanding losses. It had no statistical foundation. The algorithm is as follows.
Define the following factors:
ˆfj=K−j∑k=1Xk,j+1/K−j∑k=1Xkj,j=1,2,…,J−1
Note that
can be expressed in the formˆfj=K−j∑k=1wkj(Xk,j+1/Xkj)
with
wkj=Xkj/K−j∑k=1Xkj
i.e., as a weighted average of factors
for fixedThen define the following forecasts of
:ˆYkj=Xk,K−k+1ˆfK−k+1ˆfK−k+2…ˆfj−2(ˆfj−1−1)
Call these chain-ladder forecasts. They yield the additional chain-ladder forecasts:
ˆXkj=Xk,K−k+1ˆfK−k+1…ˆfj−1
ˆRk=ˆXkJ−Xk,K−k+1
ˆR=K∑k=K−J+2ˆRk
3.2. Recursive models
A recursive model takes the general form
E[Xk,j+1∣Xkj]= function of Dk+j−1 and some parameters
where
is the data sub-array of obtained by deleting diagonals on the right side of until is contained in its right-most diagonal.3.2.1. Mack model
The Mack model (Mack 1993) is defined by the following assumptions.
(M1) Accident periods are stochastically independent, i.e.,
are stochastically independent if(M2) For each k = 1, 2, . . . , K, the Xkj (j varying) form a Markov chain.
(M3) For each
(a) for some parameters and
(b) for some parameters
3.2.2. ODP Mack model
Taylor (2011) defined the over-dispersed Poisson (ODP) Mack model as that satisfying assumptions (M1), (M2) and
(ODPM3) For each k=1,2,…,K and j=1,2,…,J−1,Yk,j+1∣Xkj∼ODP((fj−1)Xkj,ϕk,j+1)
where now
Assumption (ODPM3) implies (M3a). Moreover, in the special case
independent of (ODPM3) also implies (M3b) withIt is evident that, for this model to be valid, it is necessary that all
Note also that, under (ODPM3), implies that for all This means that, for each either or for allA summary of these requirements in terms of the data array
is as follows.(R1)
for all(R2) For each
(a) or
(b) for all
A data array satisfying these requirements will be called ODPM-regular.
Assumption (ODPM3) may be expressed in the following form, suitable for GLM implementation of the ODP Mack model:
Yk,j+1∣Xkj∼ODP(exp[lnXkj+ln(fj−1)],ϕ/wk,j+1)
where
wk,j+1=ϕ/ϕk,j+1.
In this form, the GLM of the
has log link, offsets parameters and weightsIt is shown by Taylor (2011) that the chain-ladder estimates of age-to-age factors (3.1) are maximum likelihood for this model.
3.3. Non-recursive models
Taylor (2011) also defined the ODP cross-classified model as that satisfying the following assumptions:
(ODPCC1) The random variables
are stochastically independent.(ODPCC2) For each k = 1, 2, . . . , K and j = 1, 2, . . . , J,
(a)
(b) for some parameters and
(c)
Assumption (ODPCC2b) may be expressed in the following form, suitable for GLM implementation of the ODP cross-classified model:
Ykj∼ODP(exp(lnαk+lnβj),ϕ/wkj)
In this form, the GLM of the
has log link, parameters and and weights satisfyingwkj=ϕ/ϕkj.
Assumption (ODPCC2b) removes one degree of redundancy from the parameter set that would otherwise be reflected by the aliasing of one parameter in the GLM.
It has long been known for the case Hachemeister and Stanard 1975; Renshaw and Verrall 1998; Taylor 2000). It is shown by England and Verrall (2002) that this result continues to hold in the more general case
that the maximum likelihood forecasts of future Ykj in this model are the same as the chain-ladder forecasts (3.5)–(3.7) (see, e.g.,Thus the ODP Mack and ODP cross-classified models produce the same maximum likelihood forecasts of loss reserves despite their fundamentally different formulations. This means that their respective correlation structures can be viewed as a means of differentiating between them.
4. Correlation between observations
4.1. Background common to recursive and non-recursive models
Consider the models defined in Sections 3.2 and 3.3 , and specifically the conditional covariance
with 0 . The following lemma is immediate from assumption (M1) or (ODPCC1).Lemma 4.1. The following is true for each of the Mack, ODP Mack and ODP cross-classified models:
Cov[Xk1,j1+m,Xk2,j2+m+n∣Xk1,j1,Xk2,j2]=0 for k1≠k2
In view of this result, attention will be focused on within-row covariances
This quantity will be denoted It is evaluated as follows:ck,j+m,j+m+n∣j=E[{Xk,j+m−E[Xk,j+m∣Xkj]}×{Xk,j+m+n−E[Xk,j+m+n∣Xkj]}∣Xkj]=E[{Xk,j+m−E[Xk,j+m∣Xkj]}×E[{Xk,j+m+n−E[Xk,j+m+n∣Xkj]}∣Xk,j+m]∣Xkj]=E[{Xk,j+m−E[Xk,j+m∣Xkj]}×{E[Xk,j+m+n∣Xk,j+m]−E[Xk,j+m+n∣Xkj]}∣Xkj].
4.2. Recursive models
4.2.1. Mack model
By recursive application of (M3a),
E[Xk,j+m+n∣Xk,j+m]=fj+m+n−1fj+m+n−2…fj+mXj+m
and so
E[Xk,j+m+n∣Xk,j+m]−E[Xk,j+m+n∣Xkj]=fj+m+n−1…fj+m{Xk,j+m−E[Xk,j+m∣Xkj]}.
Substitution of (4.2) into (4.1) yields
ck,j+m,j+m+n∣j=fj+m+n−1…fj+mVar[Xk,j+m∣Xkj]
The variance term here is evaluated by Mack (1993, 218) as
Var[Xk,j+m∣Xkj]=Xkjj+m−1∑i=jf2j+m−1…f2i+1σ2ifi−1…fj
Substitution of (4.4) into (4.3) yields
ck,j+m,j+m+nlj=fj+m+n−1…fj+mXkjj+m−1∑i=jf2j+m−1…f2i+1σ2ifi−1…fj
It then follows that
Corr[Xk,j+m,Xk,j+m+n∣Xkj]=ck,j+m,j+m+nlj[ck,j+m+n,j+m+n∣jck,j+m,j+m∣j]12=[1+Bj+m,j+m+n∣j]−12
where
Bj+m,j+m+n∣j=∑j+m+n−1i=j+mf2j+m+n−1…f2i+1σ2ifi−1…fj∑j+m−1i=jf2j+m+n−1…f2i+1σ2ifi−1…fj.
An equivalent form is
∑j+m+n−1i=j+mf2j+m+n−1…f2i+1σ2ifi−1…fj∑j+m−1i=jf2j+m−1…f2i+1σ2ifi−1…fj
Theorem 4.2. Consider an ODPM-regular data array subject to a Mack model, and consider a row
that is not identically zero. Let be strictly positive integers and let denote For a given schedule of values each of the following propositions holds:(a)
(b)
(c) increasesasany increases, or any decreases.
(d) increases as any increases and changes such that: increases if or decreases if
Proof. (a) Follows from (4.6) and the fact that
(b) By (4.7), write
\begin{array}{l} B_{j+m, j+m+n+1 \mid j} \\ \quad=\frac{\sigma_{j+m+n}^{2} f_{j+m+n-1} \ldots f_{j}}{\sum_{i=j}^{j+m-1} f_{j+m+n}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j}}+B_{j+m, j+m+n \mid j} \\ \quad>B_{j+m, j+m+n \mid j} . \end{array}
The result then follows from (4.6).
(c) Obvious from (4.8).
(d) Divide numerator and denominator of (4.7) by to obtain
B_{j+m, j+m+n \mid j}=\frac{\sum_{i=j+m}^{j+m+n-1}\left(\sigma_{i}^{2} / f_{i}\right) f_{i-1}^{-1} \ldots f_{j+m}^{-1}}{\sum_{i=j}^{j+m-1} f_{j+m-1} \ldots f_{i+1}\left(\sigma_{i}^{2} / f_{i}\right)}
and the result then follows from (4.6).
4.2.2. ODP Mack model
Expression (4.7) may be adapted to the case of the ODP Mack model with column-dependent scale parameter
Section 3.2.2 notes that, in this case,\sigma_{j}^{2}=\phi_{j+1}\left(f_{j}-1\right) \tag{4.9}
and substitution of this result in (4.7) yields
\small{ B_{j+m, j+m+n l j}=\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{j}}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{j}} \tag{4.10}}
Special case. An interesting case arises when
Then (4.10) becomesB_{j+m, j+m+n \mid j}=f^{-n}\left(f^{n}-1\right) /\left(f^{m}-1\right) . \tag{4.11}
4.3. Non-recursive models
Once again consider
Note thatX_{k, j+m+n}=X_{k, j+m}+\sum_{i=j+m+1}^{j+m+n} Y_{k i}
where all terms on the right side are mutually stochastically independent.
Therefore
\begin{aligned}
c_{k, j+m, j+m+n \mid j} & =\operatorname{Var}\left[X_{k, j+m} \mid X_{k j}\right] \\
& =\operatorname{Var}\left[X_{k j}+\sum_{i=j+1}^{j+m} Y_{k i} \mid X_{k j}\right] \\
\end{aligned}\tag{4.12}
\begin{aligned}=\sum_{i=j+1}^{j+m} \operatorname{Var}\left[Y_{k i}\right]
\end{aligned} \tag{4.13}
by (ODPCC1).
By (4.12),
\begin{aligned} \rho_{k, j+m, j+m+n l j}^{2} & =\operatorname{Var}\left[X_{k, j+m} \mid X_{k j}\right] / \operatorname{Var}\left[X_{k, j+m+n} \mid X_{k j}\right] \\ & =\sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1} / \sum_{i=j}^{j+m+n-1} \phi_{i+1} \beta_{i+1} . \end{aligned} \tag{4.14}
by (4.13) and (ODPCC2a-b).
Thus
\boldsymbol{\rho}_{j+m, j+m+n \mid j}=\left(1+D_{j+m, j+m+n l j}\right)^{-\frac{1}{2}} \tag{4.15}
with
D_{j+m, j+m+n \mid j}=\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} \beta_{i+1} / \sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1} . \tag{4.16}
Equation (11) in Verrall (1991) shows that the fj and βj are related as follows:
f_{j}=\sum_{i=1}^{j+1} \beta_{i} / \sum_{i=1}^{j} \beta_{i}
or, equivalently, when account is taken of (ODPCC2c),
\beta_{i+1}=\frac{f_{1} \ldots f_{i-1}\left(f_{i}-1\right)}{\sum_{r=1}^{J-1} f_{1} \ldots f_{r-1}\left(f_{r}-1\right)} \tag{4.17}
and this, combined with (4.16), gives
D_{j+m, j+m+n \mid j}=\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{1} \ldots f_{i-1}\left(f_{i}-1\right)}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{1} \ldots f_{i-1}\left(f_{i}-1\right)} \tag{4.18}
=\frac{\sum_{i=j+m}^{j+m+n-1} f_{j+m} \ldots f_{i-1}\left(f_{i}-1\right) \phi_{i+1}}{\sum_{i=j}^{j+m-1}\left[\left(1-f_{i}^{-1}\right) \phi_{i+1}\right] f_{i+1}^{-1} \ldots f_{j+m-1}^{-1}} . \tag{4.19}
Theorem 4.3. Consider an ODPM-regular data array subject to an ODP cross-classified model, and consider a row
that is not identically zero. Let be strictly positive integers and let denote For a given schedule of values each of the following propositions holds:(a)
(b)
(c) increases as any or increases, or any or decreases.
(d) increases as any decreases and changes such that increases.
Proof. (a) Follows directly from (4.14).
(b)-(c) Follow directly from (4.15) and (4.16).
(d) Follows directly from (4.15) and (4.19).
It is interesting to compare the results of Theorems 4.2(d) and 4.3(d). The former shows that, subject to the condition on the dispersion parameter, an increase in an fi causes
to increase in the Mack model, whereas the latter yields the opposite result in the ODP cross-classified model.Special case. An interesting special case arises when
independent ofThen (4.14) reduces
\rho_{k, j+m, j+m+n \mid j}^{2}=\sum_{i=j}^{j+m-1} \beta_{i+1} / \sum_{i=j}^{j+m+n-1} \beta_{i+1} . \tag{4.20}
Special case. As in Section 4.2.2, the case
is interesting. Here, (4.18) yieldsD_{j+m, j+m+n \mid j}=f^{m}\left(f^{n}-1\right) /\left(f^{m}-1\right) \tag{4.21}
4.4. Comparison between recursive and non-recursive models
The present sub-section will compare the correlations associated with the ODP Mack and ODP crossclassified models with column dependent dispersion parameters
For this purpose it will be assumed that the two models are subject to the same schedule of values of and where, in the case of the ODP crossclassified model, is defined by the relation immediately preceding (4.17). The two models will then be said to be compatible.Let
denote in the special case of the (recursive) ODP Mack model. Likewise, let apply to the (non-recursive) ODP cross-classified model.Further, let
denote the ratioWith subscripts suppressed,
and are related through as follows. By (4.6),B=1 /\left(\rho^{R}\right)^{2}-1
Then, by (4.15),
\left(\rho^{N R}\right)^{2}=1 /\left\{1+\pi\left[1 /\left(\rho^{R}\right)^{2}-1\right]\right\}
and hence
\rho^{N R}=\pi^{-\frac{1}{2}} \rho^{R} /\left[1+\frac{1-\pi}{\pi}\left(\rho^{R}\right)^{2}\right]^{\frac{1}{2}} \tag{4.22}
For comparative purposes, it is useful to convert (4.6) and (4.10) for the ODP Mack model into a form involving β’s as in (4.14).
Note that (4.10) may be may be expressed in the alternative form
\small{ \begin{aligned} B_{j+m, j+m+n \mid j} & =\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{1}}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{1}} \\ & =\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} \beta_{i+1}\left(f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\right)}{\sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1}\left(f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\right)} \end{aligned} \tag{4.23}}
by (4.17).
Theorem 4.4. Consider an ODPM-regular data array
(a)
(b) Hence
\rho_{k, j+m, j+m+n l j}^R \geq \rho_{k, j+m, j+m+n l j}^{N R} .
(c) as Hence as
Proof. (a) The largest multiplier of
in the numerator of (4.23) is (for ) while the smallest multiplier in the denominator is By (4.16), this proves thatB_{j+m, j+m+n \mid j} / D_{j+m, j+m+n \mid j} \leq\left(f_{j+m}^{2}\right)^{-1}
and hence the left inequality of (a).
The right inequality is similarly proved by considering the case i = j + m + n 1 in the numerator of (4.23) and i = j in the denominator.
(b) Since all f factors are not less than unity, it follows from (a) that
B_{j+m, j+m+n \mid j} \leq D_{j+m, j+m+n \mid j}
This, combined with (4.6) and (4.15), yields
\rho_{k, j+m, j+m+n l j}^{R} \geq \rho_{k, j+m, j+m+n l j}^{N R}
(c) As
for all in order that should converge as It then follows from (a) thatD_{j+m, j+m+n \mid j} / B_{j+m, j+m+n \mid j} \rightarrow 1 \text { as } j \rightarrow \infty
This, combined with (4.6) and (4.15), yields the stated result.
5. Conclusion
The ODP Mack model is a special case of the Mack model and there is a simple translation between their correlation structures (Section 3.2.2).
The respective correlation structures associated with the recursive and non-recursive models considered here show a number of similarities but also distinct dissimilarities.
Theorems 4.2 and 4.3 show that, in both cases, correlation decreases with increasing time separation of future observations. The same theorems show that, in both cases, correlations
generally increase as the dispersion coefficients of observations ( for the Mack model, and for the ODP Mack or ODP cross-classified model) up to time increase and as the dispersion of observations beyond this decreases.However, the dependency of correlations on the mean development factors
differs as between the recursive and non-recursive models. For full details, see Theorems 4.2(d) and 4.3(d). In broad terms, increasing age-to-age factors cause correlations within the recursive models to increase and within the nonrecursive models to decrease, though these results are subject to side-conditions that involve interaction between the age-to-age factors and dispersion coefficients.If comparison is made between corresponding correlations in recursive and non-recursive models that are subject to consistent parameters, it is found that the recursive correlation is always the larger. However, as the development period on which the correlation between future observations is conditioned moves further into the development tail, the recursive and non-recursive correlations converge. Full details appear in Theorem 4.4.