## 1. Introduction

The actuarial literature identifies two families of chain-ladder models categorized by Verrall (2000) as **recursive** and **non-recursive** models, respectively. Although the model formulations are fundamentally different, both are found to yield the same maximum likelihood estimators of age-to-age factors and the same forecasts of loss reserve. The properties of these models are studied by Taylor (2011).

Despite the identical forecasts of the different models, their different formulations are liable to lead to different correlation structures. This means that the correlations can be regarded as providing one means of differentiating between recursive and non-recursive models. The purpose of the present paper is the investigation of these correlation structures.

There is independence between rows in all the models considered, so the correlations of greatest interest are those between future observations conditional on information up to a defined point of time, specifically *j* in respect of accident period *k*.

## 2. Framework and notation

### 2.1. Claims data

Consider a

rectangle off incremental claims observations with:-
accident periods represented by rows and labeled

*k*= 1, 2, . . . ,*K*; -
development periods represented by columns and labeled by

*j*= 1, 2, . . . ,*J**K*.

Within the rectangle, identify a **development trapezoid** of past observations

DK={Ykj:1≤k≤K and 1≤j≤min(J,K−k+1)}

The complement of this subset, representing **future** observations is

DcK={Ykj:1≤k≤K and min(J,K−k+1)<j≤J}={Ykj:K−J+1<k≤K and K−k+1<j≤J}.

Also let

D+K=DK∪DcK

In general, the problem is to predict *Ɗ ^{c}_{K}* on the basis of observed

*Ɗ*.

_{K}The usual case in the literature (though often not in practice) is that in which *J* = *K*, so that the trapezoid becomes a triangle. The more general trapezoid will be retained throughout the present paper.

Define the **cumulative row sums**

Xkj=j∑i=1Yki

and the full **row and column sums** (or horizontal and vertical sums)

Hk=min(J,K−k+1)∑j=1YkjVj=K−j+1∑k=1Ykj.

Also define, for *k* = *K* *J* + 2, . . . , *K*,

Rk=J∑j=K−k+2Ykj=XkJ−Xk,K−k+1

R=K∑k=K−J+2Rk

Note that *R* is the sum of the (future) observations in It will be referred to as the total amount of **outstanding losses**. Likewise, *R _{k}* denotes the amount of outstanding losses in respect of accident period

*k*. The objective stated earlier is to forecast the

*R*and

_{k}*R*.

Let *k* of *Ɗ _{K}*, i.e., for fixed

*k*.

Similarly, let

denote summation over the entire column of i.e., for fixed For example, (2.2) may be expressed asVj=C(j)∑Ykj

Finally, let *k*, *j*) cells, i.e.,

T∑=K∑k=1min(J,K−k+1)∑j=1=K∑k=1R(k)∑=J∑j=1K−j+1∑k=1=J∑j=1C(j)∑.

### 2.2. Families of distributions

#### 2.2.1. Exponential dispersion family

The **exponential dispersion family** (EDF) (Nelder and Wedderburn 1972) consists of those variables *Y* with log-likelihoods of the form

ℓ(y,θ,ϕ)=[yθ−b(θ)]/a(ϕ)+c(y,ϕ)

for parameters

(canonical parameter) and (scale parameter) and suitable functions and with continuous, differentiable and one-one, and such as to produce a total probability mass of unity.For *Y* so distributed,

E[Y]=b′(θ)

Var[Y]=a(ϕ)b′′(θ)

If

denotes then (2.6) establishes a relation between and and so (2.7) may be expressed in the formVar[Y]=a(ϕ)V(μ)

for some function *V*, referred to as the **variance function**.

The notation

will be used to mean that a random variable is subject to the EDF likelihood (2.5).#### 2.2.2. Tweedie family

The **Tweedie family** (Tweedie 1984) is the subfamily of the EDF for which

a(ϕ)=ϕ

V(μ)=μp,p≤0 or p≥1.

For this family,

b(θ)=(2−p)−1{[1+(1−p)θ](2−p)/(1−p)−1}

μ=[1+(1−p)θ]1/(1−p)

ℓ(y;μ,ϕ)=[y(μ1−p−1)/(1−p)−(μ2−p−1)/(2−p)]/ϕ+c(y,ϕ)

∂ℓ/∂μ=(yμ−p−μ1−p)/ϕ.

The notation

will be used to mean that a random variable is subject to the Tweedie likelihood with parameters The abbreviated form will mean that is a member of the sub-family with specific parameter#### 2.2.3. Over-dispersed Poisson family

The **over-dispersed Poisson** (ODP) family is the Tweedie sub-family with *p* = 1. The limit of (2.12) as *p* → 1 gives

E[Y]=μ=expθ

By (2.8) (2.10),

Var[Y]=ϕμ

By (2.14),

∂ℓ/∂μ=(y−μ)/ϕμ.

The notation

means## 3. Chain-ladder models

### 3.1. Heuristic chain ladder

The chain ladder was originally (pre-1975) devised as a heuristic algorithm for forecasting outstanding losses. It had no statistical foundation. The algorithm is as follows.

Define the following factors:

ˆfj=K−j∑k=1Xk,j+1/K−j∑k=1Xkj,j=1,2,…,J−1

Note that

can be expressed in the formˆfj=K−j∑k=1wkj(Xk,j+1/Xkj)

with

wkj=Xkj/K−j∑k=1Xkj

i.e., as a weighted average of factors

for fixedThen define the following forecasts of

:ˆYkj=Xk,K−k+1ˆfK−k+1ˆfK−k+2…ˆfj−2(ˆfj−1−1)

Call these **chain-ladder forecasts**. They yield the additional chain-ladder forecasts:

ˆXkj=Xk,K−k+1ˆfK−k+1…ˆfj−1

ˆRk=ˆXkJ−Xk,K−k+1

ˆR=K∑k=K−J+2ˆRk

### 3.2. Recursive models

A recursive model takes the general form

E[Xk,j+1∣Xkj]= function of Dk+j−1 and some parameters

where

is the data sub-array of obtained by deleting diagonals on the right side of until is contained in its right-most diagonal.#### 3.2.1. Mack model

The Mack model (Mack 1993) is defined by the following assumptions.

(M1) Accident periods are stochastically independent, i.e.,

are stochastically independent if(M2) For each *k* = 1, 2, . . . , *K*, the *X _{kj}* (

*j*varying) form a Markov chain.

(M3) For each

(a) for some parameters and

(b) for some parameters

#### 3.2.2. ODP Mack model

Taylor (2011) defined the over-dispersed Poisson (ODP) Mack model as that satisfying assumptions (M1), (M2) and

(ODPM3) For each k=1,2,…,K and j=1,2,…,J−1,Yk,j+1∣Xkj∼ODP((fj−1)Xkj,ϕk,j+1)

where now

Assumption (ODPM3) implies (M3a). Moreover, in the special case

independent of (ODPM3) also implies (M3b) withIt is evident that, for this model to be valid, it is necessary that all

Note also that, under (ODPM3), implies that for all This means that, for each either or for allA summary of these requirements in terms of the data array

is as follows.(R1)

for all(R2) For each

(a) or

(b) for all

A data array satisfying these requirements will be called **ODPM-regular**.

Assumption (ODPM3) may be expressed in the following form, suitable for GLM implementation of the ODP Mack model:

Yk,j+1∣Xkj∼ODP(exp[lnXkj+ln(fj−1)],ϕ/wk,j+1)

where

wk,j+1=ϕ/ϕk,j+1.

In this form, the GLM of the

has log link, offsets parameters and weightsIt is shown by Taylor (2011) that the chain-ladder estimates of age-to-age factors (3.1) are maximum likelihood for this model.

### 3.3. Non-recursive models

Taylor (2011) also defined the **ODP cross-classified model** as that satisfying the following assumptions:

(ODPCC1) The random variables

are stochastically independent.(ODPCC2) For each *k* = 1, 2, . . . , *K* and *j* = 1, 2, . . . , *J*,

(a)

(b) for some parameters and

(c)

Assumption (ODPCC2b) may be expressed in the following form, suitable for GLM implementation of the ODP cross-classified model:

Ykj∼ODP(exp(lnαk+lnβj),ϕ/wkj)

In this form, the GLM of the

has log link, parameters and and weights satisfyingwkj=ϕ/ϕkj.

Assumption (ODPCC2b) removes one degree of redundancy from the parameter set that would otherwise be reflected by the aliasing of one parameter in the GLM.

It has long been known for the case *Y _{kj}* in this model are the same as the chain-ladder forecasts (3.5)–(3.7) (see, e.g., Hachemeister and Stanard 1975; Renshaw and Verrall 1998; Taylor 2000). It is shown by England and Verrall (2002) that this result continues to hold in the more general case

Thus the ODP Mack and ODP cross-classified models produce the same maximum likelihood forecasts of loss reserves despite their fundamentally different formulations. This means that their respective correlation structures can be viewed as a means of differentiating between them.

## 4. Correlation between observations

### 4.1. Background common to recursive and non-recursive models

Consider the models defined in Sections 3.2 and 3.3 , and specifically the conditional covariance

with 0 . The following lemma is immediate from assumption (M1) or (ODPCC1).**Lemma 4.1.** The following is true for each of the Mack, ODP Mack and ODP cross-classified models:

Cov[Xk1,j1+m,Xk2,j2+m+n∣Xk1,j1,Xk2,j2]=0 for k1≠k2

In view of this result, attention will be focused on **within-row covariances** This quantity will be denoted It is evaluated as follows:

ck,j+m,j+m+n∣j=E[{Xk,j+m−E[Xk,j+m∣Xkj]}×{Xk,j+m+n−E[Xk,j+m+n∣Xkj]}∣Xkj]=E[{Xk,j+m−E[Xk,j+m∣Xkj]}×E[{Xk,j+m+n−E[Xk,j+m+n∣Xkj]}∣Xk,j+m]∣Xkj]=E[{Xk,j+m−E[Xk,j+m∣Xkj]}×{E[Xk,j+m+n∣Xk,j+m]−E[Xk,j+m+n∣Xkj]}∣Xkj].

### 4.2. Recursive models

#### 4.2.1. Mack model

By recursive application of (M3a),

E[Xk,j+m+n∣Xk,j+m]=fj+m+n−1fj+m+n−2…fj+mXj+m

and so

E[Xk,j+m+n∣Xk,j+m]−E[Xk,j+m+n∣Xkj]=fj+m+n−1…fj+m{Xk,j+m−E[Xk,j+m∣Xkj]}.

Substitution of (4.2) into (4.1) yields

ck,j+m,j+m+n∣j=fj+m+n−1…fj+mVar[Xk,j+m∣Xkj]

The variance term here is evaluated by Mack (1993, 218) as

Var[Xk,j+m∣Xkj]=Xkjj+m−1∑i=jf2j+m−1…f2i+1σ2ifi−1…fj

Substitution of (4.4) into (4.3) yields

ck,j+m,j+m+nlj=fj+m+n−1…fj+mXkjj+m−1∑i=jf2j+m−1…f2i+1σ2ifi−1…fj

It then follows that

Corr[Xk,j+m,Xk,j+m+n∣Xkj]=ck,j+m,j+m+nlj[ck,j+m+n,j+m+n∣jck,j+m,j+m∣j]12=[1+Bj+m,j+m+n∣j]−12

where

Bj+m,j+m+n∣j=∑j+m+n−1i=j+mf2j+m+n−1…f2i+1σ2ifi−1…fj∑j+m−1i=jf2j+m+n−1…f2i+1σ2ifi−1…fj.

An equivalent form is

∑j+m+n−1i=j+mf2j+m+n−1…f2i+1σ2ifi−1…fj∑j+m−1i=jf2j+m−1…f2i+1σ2ifi−1…fj

**Theorem 4.2.** Consider an ODPM-regular data array subject to a Mack model, and consider a row that is not identically zero. Let be strictly positive integers and let denote For a given schedule of values each of the following propositions holds:

(a)

(b)

(c) increasesasany increases, or any decreases.

(d) increases as any increases and changes such that: increases if or decreases if

**Proof.** (a) Follows from (4.6) and the fact that

(b) By (4.7), write

\begin{array}{l} B_{j+m, j+m+n+1 \mid j} \\ \quad=\frac{\sigma_{j+m+n}^{2} f_{j+m+n-1} \ldots f_{j}}{\sum_{i=j}^{j+m-1} f_{j+m+n}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j}}+B_{j+m, j+m+n \mid j} \\ \quad>B_{j+m, j+m+n \mid j} . \end{array}

The result then follows from (4.6).

(c) Obvious from (4.8).

(d) Divide numerator and denominator of (4.7) by to obtain

B_{j+m, j+m+n \mid j}=\frac{\sum_{i=j+m}^{j+m+n-1}\left(\sigma_{i}^{2} / f_{i}\right) f_{i-1}^{-1} \ldots f_{j+m}^{-1}}{\sum_{i=j}^{j+m-1} f_{j+m-1} \ldots f_{i+1}\left(\sigma_{i}^{2} / f_{i}\right)}

and the result then follows from (4.6).

#### 4.2.2. ODP Mack model

Expression (4.7) may be adapted to the case of the ODP Mack model with column-dependent scale parameter

Section 3.2.2 notes that, in this case,\sigma_{j}^{2}=\phi_{j+1}\left(f_{j}-1\right) \tag{4.9}

and substitution of this result in (4.7) yields

\small{ B_{j+m, j+m+n l j}=\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{j}}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{j}} \tag{4.10}}

**Special case**. An interesting case arises when Then (4.10) becomes

B_{j+m, j+m+n \mid j}=f^{-n}\left(f^{n}-1\right) /\left(f^{m}-1\right) . \tag{4.11}

### 4.3. Non-recursive models

Once again consider

Note thatX_{k, j+m+n}=X_{k, j+m}+\sum_{i=j+m+1}^{j+m+n} Y_{k i}

where all terms on the right side are mutually stochastically independent.

Therefore

\begin{aligned}
c_{k, j+m, j+m+n \mid j} & =\operatorname{Var}\left[X_{k, j+m} \mid X_{k j}\right] \\
& =\operatorname{Var}\left[X_{k j}+\sum_{i=j+1}^{j+m} Y_{k i} \mid X_{k j}\right] \\
\end{aligned}\tag{4.12}

\begin{aligned}=\sum_{i=j+1}^{j+m} \operatorname{Var}\left[Y_{k i}\right]
\end{aligned} \tag{4.13}

by (ODPCC1).

By (4.12),

\begin{aligned} \rho_{k, j+m, j+m+n l j}^{2} & =\operatorname{Var}\left[X_{k, j+m} \mid X_{k j}\right] / \operatorname{Var}\left[X_{k, j+m+n} \mid X_{k j}\right] \\ & =\sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1} / \sum_{i=j}^{j+m+n-1} \phi_{i+1} \beta_{i+1} . \end{aligned} \tag{4.14}

by (4.13) and (ODPCC2a-b).

Thus

\boldsymbol{\rho}_{j+m, j+m+n \mid j}=\left(1+D_{j+m, j+m+n l j}\right)^{-\frac{1}{2}} \tag{4.15}

with

D_{j+m, j+m+n \mid j}=\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} \beta_{i+1} / \sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1} . \tag{4.16}

Equation (11) in Verrall (1991) shows that the *f _{j}* and

*β*are related as follows:

_{j}f_{j}=\sum_{i=1}^{j+1} \beta_{i} / \sum_{i=1}^{j} \beta_{i}

or, equivalently, when account is taken of (ODPCC2c),

\beta_{i+1}=\frac{f_{1} \ldots f_{i-1}\left(f_{i}-1\right)}{\sum_{r=1}^{J-1} f_{1} \ldots f_{r-1}\left(f_{r}-1\right)} \tag{4.17}

and this, combined with (4.16), gives

D_{j+m, j+m+n \mid j}=\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{1} \ldots f_{i-1}\left(f_{i}-1\right)}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{1} \ldots f_{i-1}\left(f_{i}-1\right)} \tag{4.18}

=\frac{\sum_{i=j+m}^{j+m+n-1} f_{j+m} \ldots f_{i-1}\left(f_{i}-1\right) \phi_{i+1}}{\sum_{i=j}^{j+m-1}\left[\left(1-f_{i}^{-1}\right) \phi_{i+1}\right] f_{i+1}^{-1} \ldots f_{j+m-1}^{-1}} . \tag{4.19}

**Theorem 4.3.** Consider an ODPM-regular data array subject to an ODP cross-classified model, and consider a row that is not identically zero. Let be strictly positive integers and let denote For a given schedule of values each of the following propositions holds:

(a)

(b)

(c) increases as any or increases, or any or decreases.

(d) increases as any decreases and changes such that increases.

**Proof.** (a) Follows directly from (4.14).

(b)-(c) Follow directly from (4.15) and (4.16).

(d) Follows directly from (4.15) and (4.19).

It is interesting to compare the results of Theorems 4.2(d) and 4.3(d). The former shows that, subject to the condition on the dispersion parameter, an increase in an *f _{i}* causes to increase in the Mack model, whereas the latter yields the opposite result in the ODP cross-classified model.

**Special case.** An interesting special case arises when independent of

Then (4.14) reduces

\rho_{k, j+m, j+m+n \mid j}^{2}=\sum_{i=j}^{j+m-1} \beta_{i+1} / \sum_{i=j}^{j+m+n-1} \beta_{i+1} . \tag{4.20}

**Special case.** As in Section 4.2.2, the case is interesting. Here, (4.18) yields

D_{j+m, j+m+n \mid j}=f^{m}\left(f^{n}-1\right) /\left(f^{m}-1\right) \tag{4.21}

### 4.4. Comparison between recursive and non-recursive models

The present sub-section will compare the correlations associated with the ODP Mack and ODP crossclassified models with column dependent dispersion parameters **compatible**.

Let

denote in the special case of the (recursive) ODP Mack model. Likewise, let apply to the (non-recursive) ODP cross-classified model.Further, let

denote the ratioWith subscripts suppressed,

and are related through as follows. By (4.6),B=1 /\left(\rho^{R}\right)^{2}-1

Then, by (4.15),

\left(\rho^{N R}\right)^{2}=1 /\left\{1+\pi\left[1 /\left(\rho^{R}\right)^{2}-1\right]\right\}

and hence

\rho^{N R}=\pi^{-\frac{1}{2}} \rho^{R} /\left[1+\frac{1-\pi}{\pi}\left(\rho^{R}\right)^{2}\right]^{\frac{1}{2}} \tag{4.22}

For comparative purposes, it is useful to convert (4.6) and (4.10) for the ODP Mack model into a form involving *β*’s as in (4.14).

Note that (4.10) may be may be expressed in the alternative form

\small{ \begin{aligned} B_{j+m, j+m+n \mid j} & =\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{1}}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{1}} \\ & =\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} \beta_{i+1}\left(f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\right)}{\sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1}\left(f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\right)} \end{aligned} \tag{4.23}}

by (4.17).

**Theorem 4.4.** Consider an ODPM-regular data array and a row within it that is not identically zero. Then, for compatible ODP Mack and ODP crossclassified models,

(a)

(b) Hence

\rho_{k, j+m, j+m+n l j}^R \geq \rho_{k, j+m, j+m+n l j}^{N R} .

(c) as Hence as

**Proof.** (a) The largest multiplier of in the numerator of (4.23) is (for ) while the smallest multiplier in the denominator is By (4.16), this proves that

B_{j+m, j+m+n \mid j} / D_{j+m, j+m+n \mid j} \leq\left(f_{j+m}^{2}\right)^{-1}

and hence the left inequality of (a).

The right inequality is similarly proved by considering the case *i* = *j* + *m* + *n* 1 in the numerator of (4.23) and *i* = *j* in the denominator.

(b) Since all *f* factors are not less than unity, it follows from (a) that

B_{j+m, j+m+n \mid j} \leq D_{j+m, j+m+n \mid j}

This, combined with (4.6) and (4.15), yields

\rho_{k, j+m, j+m+n l j}^{R} \geq \rho_{k, j+m, j+m+n l j}^{N R}

(c) As

for all in order that should converge as It then follows from (a) thatD_{j+m, j+m+n \mid j} / B_{j+m, j+m+n \mid j} \rightarrow 1 \text { as } j \rightarrow \infty

This, combined with (4.6) and (4.15), yields the stated result.

## 5. Conclusion

The ODP Mack model is a special case of the Mack model and there is a simple translation between their correlation structures (Section 3.2.2).

The respective correlation structures associated with the recursive and non-recursive models considered here show a number of similarities but also distinct dissimilarities.

Theorems 4.2 and 4.3 show that, in both cases, correlation decreases with increasing time separation of future observations. The same theorems show that, in both cases, correlations

generally increase as the dispersion coefficients of observations ( for the Mack model, and for the ODP Mack or ODP cross-classified model) up to time increase and as the dispersion of observations beyond this decreases.However, the dependency of correlations on the mean development factors

differs as between the recursive and non-recursive models. For full details, see Theorems 4.2(d) and 4.3(d). In broad terms, increasing age-to-age factors cause correlations within the recursive models to increase and within the nonrecursive models to decrease, though these results are subject to side-conditions that involve interaction between the age-to-age factors and dispersion coefficients.If comparison is made between corresponding correlations in recursive and non-recursive models that are subject to consistent parameters, it is found that the recursive correlation is always the larger. However, as the development period on which the correlation between future observations is conditioned moves further into the development tail, the recursive and non-recursive correlations converge. Full details appear in Theorem 4.4.