1. Introduction
For pricing and tariffication of insurance contracts Bayesian ideas and techniques are well investigated and widely used in practice. For the claims reserving problem Bayesian methods are less used, although we believe that they are very useful for answering practical questions (this has already been mentioned in de Alba 2002 and other sources).
In the literature, exact Bayesian models have been studied in a series of papers by Verrall (1990, 2000, 2004), de Alba (2002, 2006), de Alba and Ramírez Corzo (2006), Haastrup and Arjas (1996), Ntzoufras and Dellaportas (2002), and Scollnik (2002). Many of these results refer to explicit choices of distributions—for example, the Poisson-gamma or the (log)normal-normal cases.
The purpose of this paper is twofold:
-
It is well known in Bayesian theory that (among others) the Poisson-gamma or the normal-normal cases are specific examples of the exponential dispersion family with its associate conjugates. We show that the claims reserving problem can easily be extended to this more general family of distributions. Not surprisingly, we obtain the same results as presented in Verrall (2004) and England and Verrall (2002), Section 6.3, but now in our more general setup of distributions.
-
We show that for the exponential dispersion family with its associate conjugates we obtain a natural combination of two different claims reserving methods, namely the chain ladder method (see Mack 1993) and Bornhuetter and Ferguson (1972) (a more detailed discussion follows below). In the special case of Poisson-gamma, this has already be discovered by England and Verrall (2002), Section 6.3.
In Section 2 we define the claims reserving problem. Moreover, we introduce the exponential dispersion model with its associate conjugates and state the main results. In Section 3 we give the conclusions comparing our Bayesian model to the classical claims reserving methods, and in Subsection 3.4 we give the link to the Bühlmann-Straub credibility model. Finally, in Section 4 we implement the theory in a practical example.
2. Exponential dispersion model with its associate conjugates applied to claims reserving
2.1. The claims reserving problem
We denote by incremental data. The index denotes the accident year and the development period For example, can denote the number of claims reported in reporting period for accident year or it can also denote the incremental payments (i.e., claim amounts paid in development period for accident year ). Cumulative data are denoted by
Ci,j=j∑k=0Xi,k.
The observations up to time are denoted by
Task. Estimate for given the observations
Terminology. We assume that denote incremental payments and that denote cumulative payments. This simplifies our language. The reader may always use a different interpretation for
2.2. Exponential dispersion family
In order to predict the future random variables one introduces stochastic models. In the present work we consider the exponential dispersion family with its associate conjugates. The exponential dispersion family is well known in generalized linear models (see for example, McCullagh and Nelder 1989), and also in its applications to the claims reserving context (see for example, England and Verrall (2002) and the references therein). On the other hand, it is also a very important family of distributions for Bayesian theory.
We formulate the exponential dispersion family with its associate conjugates directly in the framework, as we will use it for the claims reserving problem. Weights could be chosen in a more general manner; however, we choose the ones favored by Mack (see discussion in Mack 1990, Section 2).
Model Assumption 2.1. Assume we have a claims development pattern with and We define and for
(A1) Conditionally, given we have that are independent with
Xi,jγj⋅μ(i)0(d)∼dF(Θi)i,j(x)=a(x,σ2γj⋅(μ(i)0)2)exp{x⋅Θi−b(Θi)σ2⋅γ−1j⋅(μ(i)0)−2}dν(x),
where is a suitable -finite measure on and is a probability distribution on
(A2) The random vectors are independent and are independent and identically distributed real-valued random variables with density
uμ,τ2(θ)=d(μ,τ2)⋅exp{μ⋅θ−b(θ)τ2}
with and
Remarks
-
plays the role of the a priori expected total claim amount for accident year denotes the proportion paid in development period Hence in Assumption (A1) we compare the payment to its expected value (see also Lemma 2.3 below).
-
Assumption (A1) means that the scaled sizes belong to the exponential dispersion family with unknown parameter (the parameters and are assumed to be known). is a (latent) random variable [see Assumption (A2)] that describes the risk characteristics of accident year
-
For the moment we assume that and are known. In practice, of course, this is not the case. We discuss the consequences of this fact below.
-
Assumption (A2) means that different accident years can be studied independently. Different accident years are combined through the fact that the claims development pattern and the variance parameters and do not depend on Moreover, it is assumed that (before we have any observations ) a priori the accident years are similar. This is reflected by the fact that we choose (for the meaning of we also refer to Lemma 2.2).
-
A special case is obtained by choosing as a Poisson distribution with parameter and as a gamma distribution. This immediately gives the model studied by Verrall (2000, 2004). Other examples are (see for example Bühlmann and Gisler 2005, Section 2.5) the binomial-beta case, gamma-gamma case, or normal-normal case.
-
Observe that must be positive. This may cause problems in practical applications, since in general may have both signs (see also de Alba and Ramírez Corzo 2006).
The following two lemmas are two key statements in Bayesian theory. We omit their proofs since they are fairly standard and can be found in Bernardo and Smith (1994) or Bühlmann and Gisler (2005) (Theorems 2.19–2.20), among other texts.
Lemma 2.2 The conditional distribution of given the observations has density with
τ2post(i)=σ2⋅[σ2τ2+(μ(i)0)2⋅β(I−i)∧J]−1,
μpost(i)=τ2post(i)σ2⋅[σ2τ2+(μ(i)0)2⋅β(I−i)∧J⋅ˉZi],
ˉZi=Ci,(I−i)∧Jβ(I−i)∧J⋅μ(i)0,
where (I − i) ∧ J denotes the minimum of (I − i) and J.
Remarks
-
The conditional distribution of the risk characteristics given the observations (the posterior distribution of the latent variable ) belongs to the same family of distributions as the a priori distribution of (before we have any observations). Thus, this meets the definition of conjugate priors.
-
The a posteriori distribution of depends only on the observations of accident year (due to Assumption (A2)).
-
We have assumed that the scaled observations have (a priori) identical distributions. However, the a posteriori distributions of given are different, which is reflected by and
-
Lemma 2.2 allows for an explicit calculation of the a posteriori (predictive) distributions of ( ), given the observations (which are independent for ), namely
P[Xi,I−i+1≤xI−i+1,…Xi,J≤xJ∣DI]=∫∏Jj=I−i+1F(θ)i,j(xjγj⋅μ(0)i)⋅uμpost(i),τ2post(i)(θ)dθ.
Henceforth, with (2.7) we can explicitly calculate the a posteriori distributions and their moments. Moreover, this allows for simulations of the random variables. The next lemma then provides a straightforward estimate for the expected total claim amounts.
Lemma 2.3 Under the Model Assumptions 2.1 we have
μ(Θi) def. =E[Xi,jγj⋅μ(i)0|Θi]=b′(Θi).
If disappears on the boundary of for all then
E[Xi,j]=γj⋅μ(i)0⋅E[μ(Θi)]=γj⋅μ(i)0,
μ~(Θi) def. =E[μ(Θi)∣DI]=α((I−i)∧J)i⋅ˉZi+(1−α((I−i)∧J)i)⋅1,
where α(j)i=βjβj+κi and κi=σ2(μ(i)0)2⋅τ2.
Remarks is a Bayesian estimator (a posteriori mean of given the observations ). It is a credibility-weighted average between the a priori mean and the observations (defined in (2.6)). The larger the individual variation the smaller the credibility weight; the larger the collective variability the larger the credibility weight (for a detailed discussion on the credibility coefficient we refer to Bühlmann and Gisler [5]).
Lemma 2.4 (Bayesian estimator for claims reserves) Choose and Then the Bayesian estimators for and in Model 2.1 are as follows
~Xi,j+k=ˆE[Xi,j+k∣Ci,0,…,Ci,j]=γj+k⋅μ(i)0⋅μ~(Θi),
~Ci,j+k=ˆE[Ci,j+k∣Ci,0,…,Ci,j]=Ci,j+(βj+k−βj)⋅μ(i)0⋅~μ(Θi).
Remark The estimators and are unbiased, -measurable and minimize the quadratic loss function (see Theorem 2.5 in [5]).
Consequence. We obtain for I − i < J (see Lemma 2.3)
E[Ci,J∣DI]=Ci,I−i+J∑j=I−i+1E[Xi,j∣DI]=Ci,I−i+J∑j=I−i+1γj⋅μ(i)0⋅E[μ(Θi)∣DI]=Ci,I−i+(1−βI−i)⋅μ(i)0⋅~μ(Θi)=~Ci,J=Ci,I−i+(1−βI−i)⋅[α(I−i)i⋅Ci,I−iβI−i+(1−α(I−i)i)⋅μ(i)0].
3. Interpretation and conclusions
In the exponential dispersion family with associate conjugates (Model Assumptions 2.1) the Bayesian estimator for the expected ultimate claim at time is given by (2.14).
Before giving an interpretation of that formula we briefly review the two (probably) most popular methods, namely the chain-ladder (CL) method (see Mack 1993) and the Bornhuetter-Ferguson (BF) method (see Bornhuetter and Ferguson 1972).
3.1. CL method
The CL method is based on the assumption that there exist development factors such that for all and
E[Ci,j∣Ci,0,…,Ci,j−1]=fj−1⋅Ci,j−1.
The CL estimator of the ultimate claim given the observations is then given by
^Ci,JCL=ˆE[Ci,J∣Ci,0,…,Ci,j]=Ci,j⋅fj⋯fJ−1.
Define Estimate (3.2) implies
^Ci,JCL=Ci,j+(1−βj)⋅^Ci,JCL.
3.2. BF method
The BF method estimates the ultimate claim by (see Mack (1990))
^Ci,JBF=Ci,j+(1−βj)⋅μ(i)0,
where is an a priori estimate ignoring the data
3.3. Combination of CL and BF method
We have now two extreme positions: The BF method only relies on the a priori estimate (ignoring the observations), whereas the CL method gives full credibility to the indication based solely on the observation
Benktander (1976) and Hovinen (1981) have made a first attempt to combine these two extreme cases. Choose and define
μ(i)0(α)=α⋅^Ci,JCL+(1−α)⋅μ(i)0.
Benktander-Hovinen (BH) have chosen which gives the BH estimate
^Ci,JBH=Ci,j+(1−βj)⋅[βj⋅^Ci,JCL+(1−βj)⋅μ(i)0].
Question. What is the optimal α? Optimality is defined here as “minimizing mean square error” (see Mack 2000, Section 3).
Mack (2000) gives a different stochastic model (see Mack 2000, (2)–(3)) under which he calculates the optimal α (see Mack 2000, Theorems 2 and 3). It is of the form
α∗=βjβj+κ.
Henceforth, the estimator in the model considered by Mack (2000) has exactly the same form as the Bayesian estimator (2.14) in our exponential dispersion model. Observe that for I − i < J we have (using (2.14))
~Ci,J=E[Ci,J∣DI]=Ci,I−i+(1−βI−i)⋅[α(I−i)i⋅^Ci,J+(1−α(I−i)i)⋅μ(i)0].
Hence we obtain in a natural way a “linear mixture” of the CL estimate and the BF estimate. It has two extreme cases:
a) Choose This leads to which is the CL estimate.
b) leads to which is the BF estimate.
3.4. Linear credibility methods
Under our Model Assumptions 2.1 we can explicitly calculate the a posteriori distribution of loss for a given accident year. Moreover the a posteriori expectation of is linear in the observations. In general this is not the case, and one cannot explicitly calculate the a posteriori distribution. In such situations one uses a linear credibility approach, which minimizes quadratic loss functions among linear estimators.
Probably the most famous model in linear credibility theory is the Bühlmann-Straub (BS) model (see Bühlmann and Gisler (2005), Chapter 4). The BS model has been used in the claims reserving context by Mack (1990), Neuhaus (1992) (see Section 3.4) and de Vylder (1982).
In the BS model one obtains exactly the same estimate for the reserves as in our exponential dispersion model, i.e., the credibility estimator for in the BS model is given by (choosing an appropriate scaling, see also Mack 1990)
^(Θi)cred=α((I−i)∧J)i⋅ˉZi+(1−α((I−i)∧J)i).
However, the credibility estimator is only the best linear approximation to and hence has a larger quadratic loss compared to the Bayes estimate. Moreover, it does not not satisfy (2.14) and hence (3.8) (this is exactly the Bayes estimate), and it does not allow for simulation, because only the first two moments are determined by the BS model.
However, for the exponential dispersion family with associate conjugates the Bayes estimate and the credibility estimate coincide.
4. Application in practice and an example
So far we have always assumed that the following parameters are known:
-
the a priori mean
-
the claims development pattern
-
the credibility coefficient and the variance parameters and
Then Lemmas 2.2 and 2.3 give the a posteriori distributions and the optimal estimators (this is a similar situation as considered in England and Verrall (2002), Section 6.3). However, in practice these are often not known and need to be estimated from the data. If we replace the parameters by their estimates, then of course we lose the optimality conditions (since we have an additional error term coming from the parameter estimation). Hence we could now build a whole new theory also trying to minimize the parameter estimation error. Since this would go beyond the scope of this paper we restrict ourselves to the replacement of the parameters by appropriate estimators.
In other words this means that in practice a full analytical Bayesian formula is often not a realistic method. One way out of this dilemma is the credibility technique. Here the credibility solution is understood in replacing the unknown parameters by appropriate estimators. In Verrall (2004) such estimators are called “plug-in” estimates. In a full Bayesian approach one would estimate both the exposures and the claims development pattern simultaneously. Such a full Bayesian approach often requires numerical methods such as the Markov Chain Monte Carlo method (see Verrall 2004, de Alba 2002, 2006, and Ntzoufras and Dellaportas 2002, or Scollnik 2002).
A priori mean. As a priori mean one usually takes a plan value or the estimate from the premium calculation (as in the BF method). In our example below, the a priori mean is known (from budget loss ratios).
Claims development pattern. The estimation of is the crucial part in which we link the different accident years. In Assumption (A2) we have assumed that the different accident years are independent, and therefore in Lemmas 2.2 and 2.3, one could not learn anything about accident year from accident year and vice versa.
Since we have assumed that all accident years have the same claims development pattern we now combine the observations of the different accident years to estimate
There is no canonical way in our model to get from the data (as there is in the CL method). In practice (and in our example below) one usually uses the CL estimate for the development factors : Given we estimate and as follows (see Mack 1993)
ˆfCLj−1=∑I−ji=0Ci,j∑I−ji=0Ci,j−1 and ˆβCLj=J−1∏k=j1/ˆfCLk.
It is well known that these estimates lead to an unbiased estimate in the CL model and one can estimate the mean square error of prediction for this model (see Mack 1993 and Buchwalder et al. 2006). However, in our model (as in the BF model) we can neither show that is an appropriate estimate for the claims development pattern nor are we able to calculate the mean square error of prediction. One can easily give an estimate for the process variance (with (2.7)), but one cannot give an estimate for the estimation error since we do not even know whether is estimated in an appropriate way.
Credibility coefficient. The credibility coefficient is calculated by estimating and ( was already given above).
Observe that (see Theorem 2.20 in Bühlmann and Gisler 2005)
Var(Xi,jγj⋅μ(i)0|Θi)=σ2⋅b′′(Θi)(μ(i)0)2⋅γj.
Without loss of generality we may assume that
mb def. =E[b′′(Θi)]=1.
Otherwise we simply multiply and by which in our context of an exponential dispersion family with associate conjugates leads to the same model with replaced by This rescaled model has then
Var(Xi,jγj⋅μ(i)0|Θi)=mb⋅σ2⋅b′′(1)(Θi)(μ(i)0)2⋅γj, with E[b′′(1)(Θi)]=1,
Var(b′(1)(Θ1))=mb⋅τ2.
The credibility weights do not change under this transformation since both and are multiplied by Hence we assume (4.3) for the rest of this work. It then follows that
^σ2=1II−1∑i=01(I−i)∧J(I−i)∧J∑j=0(μ(i)0)2⋅γj⋅(Xi,jγj⋅μ(i)0−ˉZi)2
is an unbiased estimator for σ2.
Define and
c=I−1I[I−1∑i=0wiw∙⋅(1−wiw∙)]−1,
ˉZ=I−1∑i=0wiw∙⋅ˉZi,
T=II−1I−1∑i=0wiw∙⋅(ˉZi−ˉZ)2.
Then
~τ2=c⋅(T−I⋅^σ2w∙)
is an unbiased estimator for (see Bühlmann and Gisler 2005 (4.26)). Since could be negative, we set
^τ2=max{~τ2,0} and ^κi=^σ2⋅(μ(i)0)−2/^τ2.
Remark One may view as a major deficiency of the present model that we lose the optimalities when replacing the unknown parameters by their estimates. On the other hand the following formula shows that it can be very useful in practice: define For equation (2.14) leads to the following estimate of the ultimate claim payments:
~Ci,J∗=Ci,I−i+(1−ˆβCLI−i)⋅(α(I−i)i,∗⋅Ci,I−iˆβCLI−i+(1−α(I−i)i,∗)⋅μ(i)0)=α(I−i)i,∗⋅^Ci,JCL+(1−α(I−i)i,∗)⋅^Ci,JBF.
In the last step above we have assumed that both and are estimated by (4.1) (this is the usual choice done in practice for the CL and the BF methods).
Remarks
-
Our estimate for the ultimate claim payments is a credibility weighted average of the CL estimate and the BF estimate. The credibility weight is determined by the development pattern the a priori estimate and the variances and of the processes. Since it is increasing in we give higher credibility to the CL estimate for older accident years.
-
Combining the CL estimate and the BF estimate is a very old problem in claims reserving. In some insurance companies there are rules of thumb for when to choose which estimate (see also Mack 2000). Equation (4.12) gives a natural way to combine the CL and the BF estimates.
4.1. Example
The observed incremental payments are given in Table 1.
This is a rather homogeneous dataset, with fast development. After two years, almost 90% of the total claim amount is paid. On the other hand it also looks long-tailed since we still observe some payments after seven years (in the present work we do not bother about choosing tail factors for the CL method).
Using our parameter estimates from above we obtain the following estimates for and
\widehat{\sigma^{2}}=(10,119)^{2} \quad \text { and } \quad \widehat{\tau^{2}}=(60)^{2} .
The credibility coefficients, the credibility weights and the estimates for the ultimate claim payments are now determined with the help of (4.12). We obtain the results given in Table 2.
Conclusions of the example.
- We see that the estimated is around in our example. This means that a claims development factor of gives already a credibility weight of to the observation. Observe also that the credibility weight is always smaller than 1 .
- The a priori estimate is rather conservative since the BF estimate is always larger than the CL estimate. Of course, this can have various reasons, which are not further discussed here. Our estimate then lies between the CL and the BF estimates. Since the credibility weights are larger than our estimate is closer to the CL estimate.