Chain-Ladder Correlations

Greg Taylor

1. Introduction

The actuarial literature identifies two families of chain-ladder models categorized by Verrall (2000) as recursive and non-recursive models, respectively. Although the model formulations are fundamentally different, both are found to yield the same maximum likelihood estimators of age-to-age factors and the same forecasts of loss reserve. The properties of these models are studied by Taylor (2011).

Despite the identical forecasts of the different models, their different formulations are liable to lead to different correlation structures. This means that the correlations can be regarded as providing one means of differentiating between recursive and non-recursive models. The purpose of the present paper is the investigation of these correlation structures.

There is independence between rows in all the models considered, so the correlations of greatest interest are those between future observations conditional on information up to a defined point of time, specifically \(\operatorname{Corr}\left[X_{k, j+m}, X_{k, j+m+n} \mid X_{k j}\right]\) where \(X_{k j}\) denotes cumulative claims experience (notifications, payments, etc.) up to and including development period j in respect of accident period k.

2. Framework and notation

2.1. Claims data

Consider a \(K \times J\) rectangle off incremental claims observations \(Y_{k j}\) with:

accident periods represented by rows and labeled k = 1, 2, . . . , K;
development periods represented by columns and labeled by j = 1, 2, . . . , J K.

Within the rectangle, identify a development trapezoid of past observations

\[ \mathcal{D}_{K}=\left\{Y_{k j}: 1 \leq k \leq K \text { and } 1 \leq j \leq \min (J, K-k+1)\right\} \]

The complement of this subset, representing future observations is

\[ \begin{aligned} \mathcal{D}_{K}^{c} & =\left\{Y_{k j}: 1 \leq k \leq K \text { and } \min (J, K-k+1)<j \leq J\right\} \\ & =\left\{Y_{k j}: K-J+1<k \leq K \text { and } K-k+1<j \leq J\right\} . \end{aligned} \]

Also let

\[ \mathcal{D}_{K}^{+}=\mathcal{D}_{K} \cup \mathcal{D}_{K}^{c} \]

In general, the problem is to predict Ɗ^c_K on the basis of observed Ɗ_K.

The usual case in the literature (though often not in practice) is that in which J = K, so that the trapezoid becomes a triangle. The more general trapezoid will be retained throughout the present paper.

Define the cumulative row sums

\[ X_{k j}=\sum_{i=1}^{j} Y_{k i} \tag{2.1} \]

and the full row and column sums (or horizontal and vertical sums)

\[ \begin{aligned} H_{k} & =\sum_{j=1}^{\min (J, K-k+1)} Y_{k j} \\ V_{j} & =\sum_{k=1}^{K-j+1} Y_{k j} . \end{aligned} \tag{2.2} \]

Also define, for k = K J + 2, . . . , K,

\[ R_{k}=\sum_{j=K-k+2}^{J} Y_{k j}=X_{k J}-X_{k, K-k+1} \tag{2.3} \]

\[ R=\sum_{k=K-J+2}^{K} R_{k} \tag{2.4} \]

Note that R is the sum of the (future) observations in \(\mathcal{D}_K^c\). It will be referred to as the total amount of outstanding losses. Likewise, R_k denotes the amount of outstanding losses in respect of accident period k. The objective stated earlier is to forecast the R_k and R.

Let \(\sum^{\mathcal{R}(k)}\) denote summation over the entire row k of Ɗ_K, i.e., \(\sum_{j=1}^{\min (J, K-k+1)}\) for fixed k.

Similarly, let \(\sum^{C(j)}\) denote summation over the entire column of \(\mathcal{D}_K\), i.e., \(\sum_{k=1}^{K-j+1}\) for fixed \(j\). For example, (2.2) may be expressed as

\[ V_{j}=\sum^{C(j)} Y_{k j} \]

Finally, let \(\sum^{\mathcal{I}}\) denote summation over the entire trapezoid of (k, j) cells, i.e.,

\[ \begin{aligned} \sum^{\mathcal{T}} & =\sum_{k=1}^{K} \sum_{j=1}^{\min (J, K-k+1)}=\sum_{k=1}^{K} \sum^{\mathcal{R}(k)} \\ & =\sum_{j=1}^{J} \sum_{k=1}^{K-j+1}=\sum_{j=1}^{J} \sum^{C(j)} . \end{aligned} \]

2.2. Families of distributions

2.2.1. Exponential dispersion family

The exponential dispersion family (EDF) (Nelder and Wedderburn 1972) consists of those variables Y with log-likelihoods of the form

\[ \ell(y, \theta, \phi)=[y \theta-b(\theta)] / a(\phi)+c(y, \phi) \tag{2.5} \]

for parameters \(\theta\) (canonical parameter) and \(\phi\) (scale parameter) and suitable functions \(a, b\), and \(c\), with \(a\) continuous, \(b\) differentiable and one-one, and \(c\) such as to produce a total probability mass of unity.

For Y so distributed,

\[ E[Y]=b^{\prime}(\theta) \tag{2.6} \]

\[ \operatorname{Var}[Y]=a(\phi) b^{\prime \prime}(\theta) \tag{2.7} \]

If \(\mu\) denotes \(E[Y]\), then (2.6) establishes a relation between \(\mu\) and \(\theta\), and so (2.7) may be expressed in the form

\[ \operatorname{Var}[Y]=a(\phi) V(\mu) \tag{2.8} \]

for some function V, referred to as the variance function.

The notation \(Y \sim E D F(\theta, \phi ; a, b, c)\) will be used to mean that a random variable \(Y\) is subject to the EDF likelihood (2.5).

2.2.2. Tweedie family

The Tweedie family (Tweedie 1984) is the subfamily of the EDF for which

\[ a(\phi)=\phi \tag{2.9} \]

\[ V(\mu)=\mu^{p}, p \leq 0 \text { or } p \geq 1 \text {. } \tag{2.10} \]

For this family,

\[ b(\theta)=(2-p)^{-1}\left\{[1+(1-p) \theta]^{(2-p) /(1-p)}-1\right\}\tag{2.11} \]

\[ \mu=[1+(1-p) \theta]^{1 /(1-p)} \tag{2.12} \]

\[ \begin{aligned} \ell(y ; \mu, \phi)= & {\left[y\left(\mu^{1-p}-1\right) /(1-p)\right.} \\ & \left.-\left(\mu^{2-p}-1\right) /(2-p)\right] / \phi+c(y, \phi) \end{aligned} \tag{2.13} \]

\[ \partial \ell / \partial \mu=\left(y \mu^{-p}-\mu^{1-p}\right) / \phi . \tag{2.14} \]

The notation \(Y \sim T w(\mu, \phi, p)\) will be used to mean that a random variable \(Y\) is subject to the Tweedie likelihood with parameters \(\mu, \phi, p\). The abbreviated form \(Y \sim T w(p)\) will mean that \(Y\) is a member of the sub-family with specific parameter \(p\).

2.2.3. Over-dispersed Poisson family

The over-dispersed Poisson (ODP) family is the Tweedie sub-family with p = 1. The limit of (2.12) as p → 1 gives

\[ E[Y]=\mu=\exp \theta \tag{2.15} \]

By (2.8) (2.10),

\[ \operatorname{Var}[Y]=\phi \mu \tag{2.16} \]

By (2.14),

\[ \partial \ell / \partial \mu=(y-\mu) / \phi \mu . \tag{2.17} \]

The notation \(Y \sim O D P(\mu, \phi)\) means \(Y \sim\) \(T w(\mu, \phi, 1)\).

3. Chain-ladder models

3.1. Heuristic chain ladder

The chain ladder was originally (pre-1975) devised as a heuristic algorithm for forecasting outstanding losses. It had no statistical foundation. The algorithm is as follows.

Define the following factors:

\[ \hat{f}_{j}=\sum_{k=1}^{K-j} X_{k, j+1} / \sum_{k=1}^{K-j} X_{k j}, j=1,2, \ldots, J-1 \tag{3.1} \]

Note that \(\hat{f}_j\) can be expressed in the form

\[ \hat{f}_{j}=\sum_{k=1}^{K-j} w_{k j}\left(X_{k, j+1} / X_{k j}\right) \tag{3.2} \]

with

\[ w_{k j}=X_{k j} / \sum_{k=1}^{K-j} X_{k j} \tag{3.3} \]

i.e., as a weighted average of factors \(X_{k, j+1} / X_{k j}\) for fixed \(j\).

Then define the following forecasts of \(Y_{k j} \in \mathcal{D}_K^c\):

\[ \hat{Y}_{k j}=X_{k, K-k+1} \hat{f}_{K-k+1} \hat{f}_{K-k+2} \ldots \hat{f}_{j-2}\left(\hat{f}_{j-1}-1\right) \tag{3.4} \]

Call these chain-ladder forecasts. They yield the additional chain-ladder forecasts:

\[ \hat{X}_{k j}=X_{k, K-k+1} \hat{f}_{K-k+1} \ldots \hat{f}_{j-1} \tag{3.5} \]

\[ \hat{R}_{k}=\hat{X}_{k J}-X_{k, K-k+1} \tag{3.6} \]

\[ \hat{R}=\sum_{k=K-J+2}^{K} \hat{R}_{k} \tag{3.7} \]

3.2. Recursive models

A recursive model takes the general form

\[ \begin{aligned} E & {\left[X_{k, j+1} \mid X_{k j}\right] } \\ & =\text { function of } \mathcal{D}_{k+j-1} \text { and some parameters } \end{aligned} \tag{3.8} \]

where \(\mathcal{D}_{k+j-1}\) is the data sub-array of \(\mathcal{D}_K\) obtained by deleting diagonals on the right side of \(\mathcal{D}_K\) until \(X_{k j}\) is contained in its right-most diagonal.

3.2.1. Mack model

The Mack model (Mack 1993) is defined by the following assumptions.

(M1) Accident periods are stochastically independent, i.e., \(Y_{k_1 j_1}, Y_{k_2 j_2}\) are stochastically independent if \(k_1 \neq k_2\).

(M2) For each k = 1, 2, . . . , K, the X_kj (j varying) form a Markov chain.

(M3) For each \(k=1,2, \ldots, K\) and \(j=1,2, \ldots\), \(J-1\),
(a) \(E\left[X_{k, j+1} \mid X_{k j}\right]=f_j X_{k j}\) for some parameters \(f_j>0\); and
(b) \(\operatorname{Var}\left[X_{k, j+1} \mid X_{k j}\right]=\sigma_j^2 X_{k j}\) for some parameters \(\sigma_j^2>0\).

3.2.2. ODP Mack model

Taylor (2011) defined the over-dispersed Poisson (ODP) Mack model as that satisfying assumptions (M1), (M2) and

\[ \begin{array}{l} \text { (ODPM3) For each } k=1,2, \ldots, K \\ \qquad \text { and } j=1,2, \ldots, J-1, \\ Y_{k, j+1} \mid X_{k j} \sim O D P\left(\left(f_{j}-1\right) X_{k j}, \phi_{k, j+1}\right) \end{array} \]

where now \(f_j \geq 1\).

Assumption (ODPM3) implies (M3a). Moreover, in the special case \(\phi_{k, j+1}=\phi_{j+1}\) independent of \(k\), (ODPM3) also implies (M3b) with \(\sigma_j^2=\phi_{j+1}\left(f_j-1\right)\)

It is evident that, for this model to be valid, it is necessary that all \(Y_{k, j} \geq 0\). Note also that, under (ODPM3), \(X_{k j}=0\) implies that \(X_{k, j+m}=0\) for all \(m>0\). This means that, for each \(k\), either \(Y_{k 1}>0\) or \(X_{k j}=0\) for all \(j\).

A summary of these requirements in terms of the data array \(\mathcal{D}_K\) is as follows.

(R1) \(Y_{k j} \geq 0\) for all \(Y_{k j} \in \mathcal{D}_K\).

(R2) For each \(k=1,2, \ldots, K\), either:
(a) \(Y_{k 1}>0\); or
(b) \(Y_{k j}=0\) for all \(1 \leq j \leq \min (J, K-k+1)\).

A data array satisfying these requirements will be called ODPM-regular.

Assumption (ODPM3) may be expressed in the following form, suitable for GLM implementation of the ODP Mack model:

\[ Y_{k, j+1} \mid X_{k j} \sim O D P\left(\exp \left[\ln X_{k j}+\ln \left(f_{j}-1\right)\right], \phi / w_{k, j+1}\right) \tag{3.9} \]

where

\[ w_{k, j+1}=\phi / \phi_{k, j+1} \text {. } \tag{3.10} \]

In this form, the GLM of the \(Y_{k, j+1}\) has log link, offsets \(\ln X_{k j}\), parameters \(\ln \left(f_j-1\right)\), and weights \(w_{k, j+1}\).

It is shown by Taylor (2011) that the chain-ladder estimates of age-to-age factors (3.1) are maximum likelihood for this model.

3.3. Non-recursive models

Taylor (2011) also defined the ODP cross-classified model as that satisfying the following assumptions:

(ODPCC1) The random variables \(Y_{k j} \in \mathcal{D}_K^{+}\)are stochastically independent.

(ODPCC2) For each k = 1, 2, . . . , K and j = 1, 2, . . . , J,
    (a) \(Y_{k j} \sim O D P\left(\mu_{k j}, \phi_{k j}\right)\);
    (b) \(\mu_{k j}=\alpha_k \beta_j\) for some parameters \(a_k, \beta_j>0\); and
    (c) \(\sum_{j=1}^J \beta_j=1\)

Assumption (ODPCC2b) may be expressed in the following form, suitable for GLM implementation of the ODP cross-classified model:

\[ Y_{k j} \sim O D P\left(\exp \left(\ln \alpha_{k}+\ln \beta_{j}\right), \phi / w_{k j}\right) \tag{3.11} \]

In this form, the GLM of the \(Y_{k j}\) has log link, parameters \(\ln \alpha_k\) and \(\ln \beta_j\), and weights \(w_{k j}\) satisfying

\[ w_{k j}=\phi / \phi_{k j} . \tag{3.12} \]

Assumption (ODPCC2b) removes one degree of redundancy from the parameter set that would otherwise be reflected by the aliasing of one parameter in the GLM.

It has long been known for the case \(\phi / w_{k j}=1\) that the maximum likelihood forecasts of future Y_kj in this model are the same as the chain-ladder forecasts (3.5)–(3.7) (see, e.g., Hachemeister and Stanard 1975; Renshaw and Verrall 1998; Taylor 2000). It is shown by England and Verrall (2002) that this result continues to hold in the more general case \(\phi / w_{k j}=\phi \neq 1\).

Thus the ODP Mack and ODP cross-classified models produce the same maximum likelihood forecasts of loss reserves despite their fundamentally different formulations. This means that their respective correlation structures can be viewed as a means of differentiating between them.

4. Correlation between observations

4.1. Background common to recursive and non-recursive models

Consider the models defined in Sections 3.2 and 3.3 , and specifically the conditional covariance \(\operatorname{Cov}\left[X_{k_1, j_1+m}, X_{k_2, j_2+m+n} \mid X_{k_1, j_1}, X_{k_2 j_2}\right]\) with \(m>0, n \geq\) 0 . The following lemma is immediate from assumption (M1) or (ODPCC1).

Lemma 4.1. The following is true for each of the Mack, ODP Mack and ODP cross-classified models:

\[ \operatorname{Cov}\left[X_{k_{1}, j_{1}+m}, X_{k_{2}, j_{2}+m+n} \mid X_{k_{1}, j_{1}}, X_{k_{2}, j_{2}}\right]=0 \text { for } k_{1} \neq k_{2} \]

In view of this result, attention will be focused on within-row covariances \(\operatorname{Cov}\left[X_{k, j+m}, X_{k, j+m+n} \mid X_{k j}\right]\). This quantity will be denoted \(c_{k, j+m, j+m+n \mid j}\). It is evaluated as follows:

\[ \small{ \begin{aligned} c_{k, j+m, j+m+n \mid j}&=E\left[\left\{X_{k, j+m}-E\left[X_{k, j+m} \mid X_{k j}\right]\right\}\right. \\ &\quad \left.\times\left\{X_{k, j+m+n}-\mathrm{E}\left[X_{k, j+m+n} \mid X_{k j}\right]\right\} \mid X_{k j}\right] \\ &= E\left[\left\{X_{k, j+m}-E\left[X_{k, j+m} \mid X_{k j}\right]\right\}\right. \\ & \quad \left.\times E\left[\left\{X_{k, j+m+n}-E\left[X_{k, j+m+n} \mid X_{k j}\right]\right\} \mid X_{k, j+m}\right] \mid X_{k j}\right] \\ &= E\left[\left\{X_{k, j+m}-E\left[X_{k, j+m} \mid X_{k j}\right]\right\}\right. \\ & \quad \left.\times\left\{E\left[X_{k, j+m+n} \mid X_{k, j+m}\right]-E\left[X_{k, j+m+n} \mid X_{k j}\right]\right\} \mid X_{k j}\right] . \end{aligned} \tag{4.1} } \]

4.2. Recursive models

4.2.1. Mack model

By recursive application of (M3a),

\[ E\left[X_{k, j+m+n} \mid X_{k, j+m}\right]=f_{j+m+n-1} f_{j+m+n-2} \ldots f_{j+m} X_{j+m} \]

and so

\[ \begin{array}{l} E\left[X_{k, j+m+n} \mid X_{k, j+m}\right]-E\left[X_{k, j+m+n} \mid X_{k j}\right] \\ \quad=f_{j+m+n-1} \ldots f_{j+m}\left\{X_{k, j+m}-E\left[X_{k, j+m} \mid X_{k j}\right]\right\} . \end{array} \tag{4.2} \]

Substitution of (4.2) into (4.1) yields

\[ c_{k, j+m, j+m+n \mid j}=f_{j+m+n-1} \ldots f_{j+m} \operatorname{Var}\left[X_{k, j+m} \mid X_{k j}\right] \tag{4.3} \]

The variance term here is evaluated by Mack (1993, 218) as

\[ \operatorname{Var}\left[X_{k, j+m} \mid X_{k j}\right]=X_{k j} \sum_{i=j}^{j+m-1} f_{j+m-1}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j} \tag{4.4} \]

Substitution of (4.4) into (4.3) yields

\[\small{ c_{k, j+m, j+m+n l j}=f_{j+m+n-1} \ldots f_{j+m} X_{k j} \sum_{i=j}^{j+m-1} f_{j+m-1}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j} \tag{4.5}} \]

It then follows that

\[\small{ \begin{aligned} \operatorname{Corr}\left[X_{k, j+m}, X_{k, j+m+n} \mid X_{k j}\right] & =\frac{c_{k, j+m, j+m+n l j}}{\left[c_{k, j+m+n, j+m+n \mid j} c_{k, j+m, j+m \mid j}\right]^{\frac{1}{2}}} \\ & =\left[1+B_{j+m, j+m+n \mid j}\right]^{-\frac{1}{2}} \end{aligned} \tag{4.6}} \]

where

\[ B_{j+m, j+m+n \mid j}=\frac{\sum_{i=j+m}^{j+m+n-1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j}}{\sum_{i=j}^{j+m-1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j}} . \tag{4.7} \]

An equivalent form is

\[ \frac{\sum_{i=j+m}^{j+m+n-1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j}}{\sum_{i=j}^{j+m-1} f_{j+m-1}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j}} \tag{4.8} \]

Theorem 4.2. Consider an ODPM-regular data array subject to a Mack model, and consider a row \(\mathrm{k}\) that is not identically zero. Let \(j, m, n\) be strictly positive integers and let \(\rho_{k, j+m, j+m+n l j}\) denote \(\operatorname{Corr}\left[X_{k, j+m,} X_{k, j+m+n} \mid X_{k j}\right]\). For a given schedule of values \(\left\{f_i, \sigma_i^2\right\}\) each of the following propositions holds:

(a) \(0<\rho_{k, j+m, j+m+n l j}<1\).
(b) \(\rho_{k, j+m, j+m+n+11 j}<\rho_{k, j+m, j+m+n l j}\).
(c) \(\rho_{k, j+m, j+m+n \mid j}\) increasesasany \(\sigma_i^2, j \leq i \leq j+m-1\) increases, or any \(\sigma_i^2, j+m \leq i \leq j+m+n-1\) decreases.
(d) \(\quad \rho_{k, j+m, j+m+n \mid j}\) increases as any \(f_i, j+1 \leq i \leq j+\) \(m+n-1\) increases and \(\sigma_i^2\) changes such that: \(\sigma_i^2 / f_i\) increases if \(j \leq i \leq j+m-1\); or \(\boldsymbol{\sigma}_i^2 / f_i\) decreases if \(j+m \leq i \leq j+m+n-1\).

Proof. (a) Follows from (4.6) and the fact that \(B_{j+m, j+m+n \mid j}>0\).

(b) By (4.7), write

\[ \begin{array}{l} B_{j+m, j+m+n+1 \mid j} \\ \quad=\frac{\sigma_{j+m+n}^{2} f_{j+m+n-1} \ldots f_{j}}{\sum_{i=j}^{j+m-1} f_{j+m+n}^{2} \ldots f_{i+1}^{2} \sigma_{i}^{2} f_{i-1} \ldots f_{j}}+B_{j+m, j+m+n \mid j} \\ \quad>B_{j+m, j+m+n \mid j} . \end{array} \]

The result then follows from (4.6).

(c) Obvious from (4.8).
(d) Divide numerator and denominator of (4.7) by \(f_{j+m+n-1}^2 \ldots f_{j+m}^2 f_{j+m-1} \ldots f_j\) to obtain

\[ B_{j+m, j+m+n \mid j}=\frac{\sum_{i=j+m}^{j+m+n-1}\left(\sigma_{i}^{2} / f_{i}\right) f_{i-1}^{-1} \ldots f_{j+m}^{-1}}{\sum_{i=j}^{j+m-1} f_{j+m-1} \ldots f_{i+1}\left(\sigma_{i}^{2} / f_{i}\right)} \]

and the result then follows from (4.6).

4.2.2. ODP Mack model

Expression (4.7) may be adapted to the case of the ODP Mack model with column-dependent scale parameter \(\phi_{k j}=\phi_j\). Section 3.2.2 notes that, in this case,

\[ \sigma_{j}^{2}=\phi_{j+1}\left(f_{j}-1\right) \tag{4.9} \]

and substitution of this result in (4.7) yields

\[\small{ B_{j+m, j+m+n l j}=\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{j}}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{j}} \tag{4.10}} \]

Special case. An interesting case arises when \(f_i= f, \phi_{i+1}=\phi, i=j, j+1, \ldots, j+m+n-1\). Then (4.10) becomes

\[ B_{j+m, j+m+n \mid j}=f^{-n}\left(f^{n}-1\right) /\left(f^{m}-1\right) . \tag{4.11} \]

4.3. Non-recursive models

Once again consider \(\boldsymbol{\rho}_{k, j+m, j+m+n \mid j}\). Note that

\[ X_{k, j+m+n}=X_{k, j+m}+\sum_{i=j+m+1}^{j+m+n} Y_{k i} \]

where all terms on the right side are mutually stochastically independent.

Therefore

\[ \begin{aligned} c_{k, j+m, j+m+n \mid j} & =\operatorname{Var}\left[X_{k, j+m} \mid X_{k j}\right] \\ & =\operatorname{Var}\left[X_{k j}+\sum_{i=j+1}^{j+m} Y_{k i} \mid X_{k j}\right] \\ \end{aligned}\tag{4.12}\]
\[\begin{aligned}=\sum_{i=j+1}^{j+m} \operatorname{Var}\left[Y_{k i}\right] \end{aligned} \tag{4.13} \]

by (ODPCC1).

By (4.12),

\[ \begin{aligned} \rho_{k, j+m, j+m+n l j}^{2} & =\operatorname{Var}\left[X_{k, j+m} \mid X_{k j}\right] / \operatorname{Var}\left[X_{k, j+m+n} \mid X_{k j}\right] \\ & =\sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1} / \sum_{i=j}^{j+m+n-1} \phi_{i+1} \beta_{i+1} . \end{aligned} \tag{4.14} \]

by (4.13) and (ODPCC2a-b).

Thus

\[ \boldsymbol{\rho}_{j+m, j+m+n \mid j}=\left(1+D_{j+m, j+m+n l j}\right)^{-\frac{1}{2}} \tag{4.15} \]

with

\[ D_{j+m, j+m+n \mid j}=\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} \beta_{i+1} / \sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1} . \tag{4.16} \]

Equation (11) in Verrall (1991) shows that the f_j and β_j are related as follows:

\[ f_{j}=\sum_{i=1}^{j+1} \beta_{i} / \sum_{i=1}^{j} \beta_{i} \]

or, equivalently, when account is taken of (ODPCC2c),

\[ \beta_{i+1}=\frac{f_{1} \ldots f_{i-1}\left(f_{i}-1\right)}{\sum_{r=1}^{J-1} f_{1} \ldots f_{r-1}\left(f_{r}-1\right)} \tag{4.17} \]

and this, combined with (4.16), gives

\[ D_{j+m, j+m+n \mid j}=\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{1} \ldots f_{i-1}\left(f_{i}-1\right)}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{1} \ldots f_{i-1}\left(f_{i}-1\right)} \tag{4.18} \]

\[ =\frac{\sum_{i=j+m}^{j+m+n-1} f_{j+m} \ldots f_{i-1}\left(f_{i}-1\right) \phi_{i+1}}{\sum_{i=j}^{j+m-1}\left[\left(1-f_{i}^{-1}\right) \phi_{i+1}\right] f_{i+1}^{-1} \ldots f_{j+m-1}^{-1}} . \tag{4.19} \]

Theorem 4.3. Consider an ODPM-regular data array subject to an ODP cross-classified model, and consider a row \(\mathrm{k}\) that is not identically zero. Let \(j, m, n\) be strictly positive integers and let \(\rho_{k, j+m, j+m+n l j}\) denote \(\operatorname{Corr}\left[X_{k, j+m}, X_{k, j+m+n} \mid X_{k j}\right]\). For a given schedule of values \(\left\{\beta_i, \phi_i\right\}\) each of the following propositions holds:

(a) \(0<\rho_{k, j+m, j+m+n \mid j}<1\).
(b) \(\rho_{k, j+m, j+m+n+11 j}<\rho_{k, j+m, j+m+n \mid j}\).
(c) \(\rho_{k, j+m, j+m+n \mid j}\) increases as any \(\phi_i\) or \(\beta_i, j+1 \leq\) \(i \leq j+m\) increases, or any \(\phi_i\) or \(\beta_i, j+m+1 \leq\) \(i \leq j+m+n\) decreases.
(d) \(\rho_{k, j+m, j+m+n \mid j}\) increases as any \(f_i, j+1 \leq i \leq\) \(j+m+n-1\) decreases and \(\phi_{i+1}, i=j+\) \(1, \ldots, j+m-1\) changes such that \(\left(1-f_i^{-1}\right) \phi_{i+1}\) increases.

Proof. (a) Follows directly from (4.14).
(b)-(c) Follow directly from (4.15) and (4.16).
(d) Follows directly from (4.15) and (4.19).

It is interesting to compare the results of Theorems 4.2(d) and 4.3(d). The former shows that, subject to the condition on the dispersion parameter, an increase in an f_i causes \(\boldsymbol{\rho}_{k, j+m, j+m+n \mid j}\) to increase in the Mack model, whereas the latter yields the opposite result in the ODP cross-classified model.

Special case. An interesting special case arises when \(\phi_i=\phi\), independent of \(i\).

Then (4.14) reduces

\[ \rho_{k, j+m, j+m+n \mid j}^{2}=\sum_{i=j}^{j+m-1} \beta_{i+1} / \sum_{i=j}^{j+m+n-1} \beta_{i+1} . \tag{4.20} \]

Special case. As in Section 4.2.2, the case \(f_i=f\), \(\phi_{i+1}=\phi, i=j, j+1, \ldots, j+m+n-1\) is interesting. Here, (4.18) yields

\[ D_{j+m, j+m+n \mid j}=f^{m}\left(f^{n}-1\right) /\left(f^{m}-1\right) \tag{4.21} \]

4.4. Comparison between recursive and non-recursive models

The present sub-section will compare the correlations associated with the ODP Mack and ODP crossclassified models with column dependent dispersion parameters \(\phi_{k j}=\phi_j\). For this purpose it will be assumed that the two models are subject to the same schedule of values of \(f_j, j=1,2, \ldots, J-1\) and \(\phi_j\), \(j=2,3, \ldots, J\) where, in the case of the ODP crossclassified model, \(f_j\) is defined by the relation immediately preceding (4.17). The two models will then be said to be compatible.

Let \(\rho_{k, j+m, j+m+n l j}^R\) denote \(\rho_{k, j+m, j+m+n l j}\) in the special case of the (recursive) ODP Mack model. Likewise, let \(\rho_{k, j+m, j+m+n l j}^{N R}\) apply to the (non-recursive) ODP cross-classified model.

Further, let \(\pi_{j+m, j+m+n \mid j}\) denote the ratio \(D_{j+m, j+m+n \mid j} /\) \(B_{j+m, j+m+n l j}\).

With subscripts suppressed, \(\rho^R\) and \(\rho^{N R}\) are related through \(\pi\) as follows. By (4.6),

\[ B=1 /\left(\rho^{R}\right)^{2}-1 \]

Then, by (4.15),

\[ \left(\rho^{N R}\right)^{2}=1 /\left\{1+\pi\left[1 /\left(\rho^{R}\right)^{2}-1\right]\right\} \]

and hence

\[ \rho^{N R}=\pi^{-\frac{1}{2}} \rho^{R} /\left[1+\frac{1-\pi}{\pi}\left(\rho^{R}\right)^{2}\right]^{\frac{1}{2}} \tag{4.22} \]

For comparative purposes, it is useful to convert (4.6) and (4.10) for the ODP Mack model into a form involving β’s as in (4.14).

Note that (4.10) may be may be expressed in the alternative form

\[\small{ \begin{aligned} B_{j+m, j+m+n \mid j} & =\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{1}}{\sum_{i=j}^{j+m-1} \phi_{i+1} f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\left(f_{i}-1\right) f_{i-1} \ldots f_{1}} \\ & =\frac{\sum_{i=j+m}^{j+m+n-1} \phi_{i+1} \beta_{i+1}\left(f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\right)}{\sum_{i=j}^{j+m-1} \phi_{i+1} \beta_{i+1}\left(f_{j+m+n-1}^{2} \ldots f_{i+1}^{2}\right)} \end{aligned} \tag{4.23}} \]

by (4.17).

Theorem 4.4. Consider an ODPM-regular data array \(\mathcal{D}_k^{+}\), and a row \(k\) within it that is not identically zero. Then, for compatible ODP Mack and ODP crossclassified models,
(a) \(f_{j+m}^2 \leq D_{j+m, j+m+n \mid j} / B_{j+m, j+m+n \mid j} \leq f_{j+m+n-1}^2 \ldots f_{j+1}^2\).
(b) \(\pi_{k, j+m, j+m+n \mid j} \geq 1\). Hence
\[ \rho_{k, j+m, j+m+n l j}^R \geq \rho_{k, j+m, j+m+n l j}^{N R} . \]
(c) \(\pi_{k, j+m, j+m+n \mid j} \rightarrow 1\) as \(j \rightarrow \infty\). Hence \(\rho_{k, j+m, j+m+n \mid j}^R /\) \(\rho_{k, j+m, j+m+n \mid j}^{N R} \rightarrow 1\) as \(j \rightarrow \infty\).

Proof. (a) The largest multiplier of \(\phi_{i+1} \beta_{i+1}\) in the numerator of (4.23) is \(f_{j+m+n-1}^2 \ldots f_{j+m+1}^2\) (for \(i=j+m\) ) while the smallest multiplier in the denominator is \(f_{j+m+n-1}^2 \ldots f_{j+m}^2 (i=j+\) \(m-1)\). By (4.16), this proves that

\[ B_{j+m, j+m+n \mid j} / D_{j+m, j+m+n \mid j} \leq\left(f_{j+m}^{2}\right)^{-1} \]

and hence the left inequality of (a).

The right inequality is similarly proved by considering the case i = j + m + n 1 in the numerator of (4.23) and i = j in the denominator.

(b) Since all f factors are not less than unity, it follows from (a) that

\[ B_{j+m, j+m+n \mid j} \leq D_{j+m, j+m+n \mid j} \]

This, combined with (4.6) and (4.15), yields

\[ \rho_{k, j+m, j+m+n l j}^{R} \geq \rho_{k, j+m, j+m+n l j}^{N R} \]

(c) As \(j \rightarrow \infty, f_i \rightarrow 1\) for all \(i \geq j\) in order that \(\mathrm{E}\left[X_{k j}\right]=X_{k, K-k+1} f_{K-k+1} f_{K-k+2} \ldots f_{j-1}\) should converge as \(j \rightarrow \infty\). It then follows from (a) that

\[ D_{j+m, j+m+n \mid j} / B_{j+m, j+m+n \mid j} \rightarrow 1 \text { as } j \rightarrow \infty \]

This, combined with (4.6) and (4.15), yields the stated result.

5. Conclusion

The ODP Mack model is a special case of the Mack model and there is a simple translation between their correlation structures (Section 3.2.2).

The respective correlation structures associated with the recursive and non-recursive models considered here show a number of similarities but also distinct dissimilarities.

Theorems 4.2 and 4.3 show that, in both cases, correlation decreases with increasing time separation of future observations. The same theorems show that, in both cases, correlations \(\rho_{k, j+m, j+m+n l j}\) generally increase as the dispersion coefficients of observations ( \(\sigma_i^2\) for the Mack model, and \(\phi_i\) for the ODP Mack or ODP cross-classified model) up to time \(j+m\) increase and as the dispersion of observations beyond this decreases.

However, the dependency of correlations on the mean development factors \(f_i\) differs as between the recursive and non-recursive models. For full details, see Theorems 4.2(d) and 4.3(d). In broad terms, increasing age-to-age factors cause correlations within the recursive models to increase and within the nonrecursive models to decrease, though these results are subject to side-conditions that involve interaction between the age-to-age factors and dispersion coefficients.

If comparison is made between corresponding correlations in recursive and non-recursive models that are subject to consistent parameters, it is found that the recursive correlation is always the larger. However, as the development period on which the correlation between future observations is conditioned moves further into the development tail, the recursive and non-recursive correlations converge. Full details appear in Theorem 4.4.