1. Introduction
In an experiencerating plan with a primaryexcess split, such as the one promulgated by the National Council on Compensation Insurance (NCCI) (2002) for rating workers compensation risks, individual risk losses are divided into primary and excess components. A credibilityweighted estimate of each component is obtained and the two estimates are added together to produce the final experienceadjusted estimate of total loss.^{[1]}
1.1. Conceptual foundation
But does this splitting procedure lead to a better estimate than credibility weighting without a split? Gillam (1989, 1992) and others have presented strong empirical evidence that a split actually does work better. However, we have found no paper that correctly and completely explains why it should work better. The first purpose of this paper is to provide a rigorous conceptual foundation for split credibility. Then we will use that as a base to arrive at a clear understanding of the conditions needed for splitting to produce materially superior estimates.
1.2. Incorrect intuitive justification
We start by explaining what is wrong with an intuitive justification for split credibility that is given in the literature. The incorrect justification is that use of a split breaks total loss into two components, each separately less volatile and, thus the argument goes, more credible than the total.^{[2]} However, as is generally accepted, excess layer loss is inherently more volatile than total loss.^{[3]} To be more precise, as shown in Appendix A, excess layer loss has a process risk coefficient of variation (CV) at least as large as the corresponding CV for total loss.
1.3. Volatility alone does not determine credibility
Because the excess layer is more volatile, can we therefore conclude it is less credible than the primary layer? Many readers may be thinking the answer is an obvious “yes.” Recalling further that such a relation between primary and excess layer credibility is designed into the NCCI plan,^{[4]} they might feel even more certain that excess layer credibility must be less than primary layer credibility. However, as we will later see, there is nothing in the general mathematics that forces such a relation. The reason is simply that “high volatility” is not synonymous with “low credibility.” Rather, credibility is conceptually the weight given to observed data, as opposed to the weight given to prior belief. It depends not only on volatility (process risk), but also on the uncertainty in our initial belief (parameter risk).
An oftenunderappreciated aspect of credibility is that credibility is positively correlated with initial ignorance (parameter risk): the less we think we know in advance, the more willing we are to be swayed by the observed data, even if it is noisy. So, when we try to assess credibility in a split plan, we need to examine not only the process variances, but also the parameter variances of the split components. In general, with an arbitrary loss model, there is nothing to prevent a split from allocating a relatively larger portion of parameter risk than process risk to the excess layer. If that happens, the excess layer may end up having more credibility than the primary layer.
In addition, when analyzing these components and their credibility, it is insufficient to consider them in isolation: their process covariance and parameter covariance^{[5]} both need to be considered.
1.4. Risk allocation and effective splitting
When we split losses, we induce allocations of the process risk and the parameter risk to the separate components and to their process covariance and parameter covariance. Based on the realization that splitting leads to such allocations, we can then see that an effective split plan necessarily entails a tradeoff in which one component gets a higher credibility and the other, a lower credibility. The key to achieving an effective plan is thus to define components so as to separate a less predictable portion of losses from a more predictable one. To put it another way, a split will work if it helps us concentrate on the signal and ignore the noise.
1.5. Reduction of mean square parameter error with optimal credibility
To show that these understandings are, in fact, correct, we will start first by examining experience rating when there is no split. Using minimal meansquared error as the criterion for optimality, we will show that use of the optimal credibility value reduces the expected square error of the estimate of the mean (the parameter risk) by a ratio equal to that optimal credibility value.
For example, if the optimal credibility is 40% and the original expected parameter variance is 100, the optimally credibilityweighted estimate of the mean will have an expected square error of 60.
Thus optimal credibility has a dual role:

It is the best weight to assign to observed data as opposed to prior belief in arriving at an estimate of the mean, and

It is the percentage by which parameter variance is reduced by using the optimal weight on experience in computing the experienceadjusted estimate.
1.6. Reduction of mean square error with optimal split credibilities
We will then turn to an arbitrary split plan, where the split is any manner of dividing losses. We will derive optimal credibility formulas, where optimality here again denotes maximal reduction in the expected square error of the estimate of the mean. Our formulas are equivalent to formulas previously presented by Mahler (1987) with notation modified to facilitate interpretation. We will then study the reduction in mean square error when optimal credibility values are used, leading to a split credibility version of the error reduction formula. In the split model formula, we first allocate the original parameter variance to the components. Under this particular allocation,^{[6]} each component gets its own variance plus the covariance. When optimal credibilities are used, the square error for each allocated component is reduced by its credibility.
Returning to the example in Section 1.5, suppose we split the losses in two, and suppose further the components have parameter variances of 60 and 20, respectively, and a parameter covariance of 10. Note this reconciles with a parameter variance of 100 for the unsplit total, since 60 + 20 + 2 * 10 = 100. The allocations of the total parameter variance are therefore 70 (70 = 60 + 10) and 30 (30 = 20 + 10). If the optimal credibilities are 50% and 20%, then the mean square error is 70 * (100% – 50%) + 30 * (100% – 20%) = 35 + 24 = 59. Recall that the optimal unsplit credibility was 40% and that use of unsplit credibility thus reduced square error from 100 to 60. Introduction of the split has reduced parameter error in this example, but only a modest amount from 60 to 59.
1.7. Differential risk allocation determines split effectiveness
To study, in general, what might be gained by adopting a split plan, we will take the difference in the mean square estimation errors between the optimal nonsplit and split plans. Based on the resulting formula, we will show split credibility is most effective at reducing mean square estimation error when the two components have relatively different amounts of process and parameter risk. If there is such a differential allocation, the component with the lion’s share of parameter risk ends up with optimal credibility larger than the optimal credibility for the unsplit losses while the component with the lion’s share of process risk has optimal credibility smaller than the optimal credibility for the unsplit losses. If the split does not produce such a differential allocation of process and parameter risk, it need not be appreciably more effective than a nosplit plan.
1.8. Primaryexcess splits
While an arbitrary split might not produce much of an improvement, one might hope a reasonable primaryexcess split would do better. Such a split will allocate the volatile tail of severity to the excess layer so the excess layer will receive a disproportionate share of the overall process risk. However, as argued previously, we can say nothing about whether the split is effective unless we also know how the parameter risk gets allocated. That, in turn, depends on the structure of the loss model and its priors. With an arbitrary loss model and arbitrary priors, there is no reason the split could not allocate a proportion of the parameter risk that is smaller to, equal to, or greater than the proportion of the process risk allocated to the excess layer. As a result, we arrive at the possibly disappointing conclusion that a primaryexcess split does not, in general, significantly improve accuracy.
The key to whether the split is effective depends critically on how much parameter uncertainty there is with respect to the severity of losses. In the extreme case where mean severity is fixed and only the mean claim counts are uncertain, then actual excess losses are just a noisy distraction from the true signal emanating from the primary loss. When that is the case, a split is very effective, and the smaller the split point the better. However, when severity is subject to significant parameter risk, splitting may not accomplish much at all.
1.9. Misbehavior of optimal split credibility values
Under the NCCI plan, credibility values are wellbehaved in two respects:

Both primary and excess credibilities are between zero and unity and there are no negatives or values over 100%, and

There are no inversions: primary credibilities are always less than or equal to excess credibilities.
However, optimal split plan credibilities under the minimal MSE criteria do not necessarily obey these guidelines. If mean frequency is known to a fair degree of accuracy, while mean severity is quite uncertain, optimal credibility values may become inverted. Intuitively, in such a scenario, the primary layer results carry little information about severity, but it is information about severity that is needed arrive at a better estimate of mean loss.
Mathematically, a negative credibility value for either the primary or the excess layer can emerge as a solution to the optimal mean square error equations. Intuitively, this could occur when a split allocates most process risk to one component and it also induces a sizeable parameter covariance. In such a situation, results from the nonvolatile component may provide better information about the other component than its own results.
1.10. Split credibility when losses follow the collective risk model
We will examine split credibility under the Heckman and Meyers (1983) collective risk model (CRM). In that model, claim counts are assumed to be conditionally Poisson with a Gamma prior. Parameter risk for the claim counts is driven by the “contagion” parameter. Claim severities are conditionally exponential and also have a Gamma prior. Parameter risk for severity is captured in the “mixing” parameter, which quantifies uncertainty about the scale. We will derive equations for split credibility under CRM. Our equations are equivalent to Mahler’s (1987), though we use a different notation to facilitate interpretation.
As might be expected based on prior discussion, a split does not automatically confer any great advantage when the underlying losses follow the CRM. With some sets of parameters it works fairly well; with others it confers modest or even no improvement at all over unsplit credibility. In some cases, a CRM can produce primaryexcess credibility inversions in which the optimal excess layer credibility is larger than the optimal primary layer credibility. For example, a primaryexcess credibility inversion would be present if the optimal primary layer credibility was 25% while the optimal excess layer credibility was 40%. In still other cases, one can have primary credibilities over 100% and excess credibilities that are negative. Stranger still, there are scenarios in which the primary credibility is negative. The interplay of contagion, mixing and split point governs which scenario will prevail.
1.11. Loss capping and mod extension
Under the NCCI experience rating plan individual accidents are subject to an accident limit, the State Accident Limit (SAL^{[7]}), before being split into primary and excess components by the split point. The sum of credibility weighted primary and excess losses is compared to a calculated value of expected loss^{[8]} that reflects the accident limit. This produces the experience modification factor (Mod) for a risk. While the Mod is obtained from losses that are capped at the accident limit, it is then applied to initial expected losses that are uncapped to arrive at the final estimate of experience adjusted expected losses.
There are theoretical and practical justifications^{[9]} for this capping and mod extension procedure. Conceptually, it tames the severity tail that is often the key driver of overall process risk. This stabilizing effect tends to increase the credibility of the excess layer. It comes at a price, however; there is uncertainty in extrapolating from capped losses to uncapped losses. One course for future research is to use the methods developed in this paper to analyze optimal mean square error for Mod extension estimates.^{[10]}
1.12. MSE derivation and CRM support for primaryexcess split
Our conclusion is that the minimal mean square error credibility derivation does not provide strong conceptual support for a primaryexcess split. Further, with CRM losses, split credibility may or may not do appreciably better than unsplit credibility. In addition, optimal credibilities may not be wellbehaved: the model may produce negative credibilities or primaryexcess credibility inversions. Later we will end with very brief speculation on what could be done to provide more support for primaryexcess split credibility.
2. Nosplit credibility
We start with a general nosplit plan. Let A be the random variable representing actual historical loss. We suppose A is dependent on a possibly multidimensional parameter, θ, and define µ(θ) = E[Aθ] and σ^{2}(θ) = Var(Aθ). Let h be the prior distribution of θ and use h to define E = E[µ(θ)], σ^{2} = E[σ^{2}(θ)], and τ^{2} = Var(µ(θ)). Under this notation, σ^{2} is a measure of the process risk and τ^{2} is a measure of parameter risk. We also set λ^{2} = σ^{2} + τ^{2} so that λ^{2} is the total variance of A.
In this construction each risk has a particular θ value that we have no way of knowing in advance. Our initial knowledge is only about the distribution of the parameter, θ.
Given an observation of A for a particular risk, we could use Bayes Theorem to obtain the posterior distribution, h(θA).^{[11]} From this, we could in principle compute the conditional expected value, E[µ(θ) A]. However, the conditional expected value may be difficult to compute and so a linear mod formula is often used. Regarding z as a variable, the resulting linear estimate of the expected value of A is given as
ˆA=z⋅A+(1−z)⋅E.
Here credibility, z, is the weight given to the actual experience. We use the notation, z*, to denote the optimal credibility value under the least mean square error criterion. To find this optimal credibility, we first write the mean square error as a function of the credibility:
ε2=E[(zA+(1−z)E−μ(θ))2]=z2⋅E[(A−μ(θ))2]+(1−z)2⋅E[(E−μ(θ))2]=z2σ2+(1−z)2τ2.
The expectation is with respect to θ and then with respect to A given θ. In simplifying Equation (2.2), various cross terms vanish under the assumption the sampling deviation of actual results from the mean for a risk is independent of the deviation of the risk mean from the population mean. Specifically, we have assumed:
E[(A−μ(θ))(E−μ(θ))]=0.
This assumption is plausible because in the CRM and most other loss model constructions the parameters are first randomly selected from the priors and then the values of the losses and sampled from the loss distributions with those selected parameters. Such a sequential procedure guarantees theoretical independence between parameter error and conditional value error as expressed in Equation (2.3).
We next use standard techniques of basic calculus to find the credibility value that minimizes the square error. Taking the derivative of the square error with respect to z, we find:
dε2dz=2zσ2−2(1−z)τ2.
Next we set the derivative to zero and solve:
dε2dz=⇒z(σ2+τ2)=τ2⇒z=τ2(τ2+σ2).
So the credibility, z*, that minimizes mean square error is given as:
z∗=τ2τ2+σ2=τ2λ2.
Using Equations (2.2) and (2.6), we see that the minimum mean squared error for the nonsplit linear estimator is given as:
ε20(NS)=(τ2τ2+σ2)2⋅σ2+(σ2τ2+σ2)2⋅τ2=(τ2σ2(τ2+σ2)2)⋅(τ2+σ2)=τ2σ2(τ2+σ2).
The “NS” label stands for “NoSplit.” Using Equation (2.6), this minimal square parameter error can be written as:
ε20(NS)=τ2σ2τ2+σ2=τ2(1−τ2λ2)=τ2(1−z∗).
Since the initial square parameter error before any observations are made is τ^{2}, this equation says use of the optimal credibility value reduces mean square parameter error by a proportion that is equal to that optimal credibility value.
3. General split plan credibilities
Assume Table 1.
can be written as the sum of two loss random variables: In this generality, the split is not necessarily between primary and excess losses: it could be any way of splitting losses. We suppose each is dependent on a possibly multidimensional parameter, and define and Also let ). Assume is the prior distribution of and use to define and Note that, in addition to the process and parameter risk terms for each loss component, we have also defined expected process covariance and parameter covariance terms. Set so that is the total covariance. Define as the total process variance, as the total parameter variance, and as the total variance. We observe that and The notation is summarized inThe split credibility Mod formula^{[12]} is:
MOD=z1A1+(1−z1)E1+z2A2+(1−z2)E2E.
As before, we derive a formula for the mean square parameter error:
ε2=E[(z1A1+(1−z1)E1−μ1(θ)+z2A2+(1−z2)E2−μ2(θ))2]=z21⋅E[(A1−μ1(θ))2]+(1−z1)2⋅E[(E1−μ1(θ))2]+z22⋅E[(A2−μ2(θ))2]+(1−z2)2⋅E[(E2−μ2(θ))2]+2z1z2E[C(θ)]+2(1−z1)(1−z2)Cov(μ1(θ),μ2(θ)).
In obtaining this expression, we have assumed that the sampling deviation of actual results from the mean for each variable is independent of the deviation of the risk mean from the population mean for both variables. In mathematical notation, these assumptions can be written as:
0=E[(A1−μ1(θ))⋅(μ1(θ)−μ1)]=E[(A1−μ1(θ))⋅(μ2(θ)−μ2)]=E[(A2−μ2(θ))⋅(μ1(θ)−μ1)]=E[(A2−μ2(θ))⋅(μ2(θ)−μ2)].
In the derivation of Equation (3.2), the assumptions in (3.3) are used to eliminate various cross terms. Note that the square error formula has other terms which do not vanish but which depend on the process and parameter covariance. These terms are present in Mahler’s formula, though in different notation. We express (3.2) using our notation, next expand expressions to arrive at terms that are polynomials of the credibilities, and then group them as follows:
ε2=z21σ21+(1−z1)2⋅τ21+z22σ22+(1−z2)2τ22+2z1z2ρ+2(1−z1)(1−z2)π=z21σ21+τ21−2z1τ21+z21τ21+z22σ22+τ22−2z2τ22+z22τ22+2z1z2ρ+2π−2z1π−2z2π+2z1z2π=τ21+τ22+2π+z21σ21+z21τ21−2z1τ21−2z1π+z22σ22+z22τ22−2z2τ22−2z2π+2z1z2ρ+2z1z2π.
Using our notation to simplify further, we have:
ε2=τ2+z21λ21−2z1(τ21+π)+z22λ22−2z2(τ22+π)+2z1z2κ.
We take partials with respect to the credibility parameters:
∂ε2∂z1=2z1λ21−2(τ21+π)+2z2κ∂ε2∂z2=2z2λ22−2(τ22+π)+2z1κ.
Setting the partials equal to zero, we obtain the system of equations:
z1λ21+z2κ=(τ21+π)z2λ22+z1κ=(τ22+π).
Solving we find:
z1=λ22(τ21+π)−κ(τ22+π)Dz2=λ21(τ22+π)−κ(τ21+π)D.
where D = λ^{2}_{1}λ^{2}_{2} κ^{2}.
Example 1 demonstrates the credibility formulas in (3.8). Example 1 is shown with additional information in Exhibit 1 Sheet 1.
As Mahler (1987) noted, the solutions of (3.8) are not really credibilities in the traditional sense, since, in this generality, one of them could be negative or have a value above unity. As proved in Proposition 2 in Appendix B, the minimal mean square error for the split plan is given as:
ε20(SP)=τ2−1D(λ22(τ21+π)2+λ21(τ22+π)2−2κ(τ21+π)(τ22+π)).
In Appendix B, it is also shown that Equation (3.9) can be reduced to:
ε20(SP)=(τ21+π)(1−z∗1)+(τ22+π)(1−z∗2).
Here we have reintroduced the “*” denoting optimal credibility to emphasize that the formula is only valid when optimal credibility values are used. The “SP” indicates the formula is for a split plan. This formula extends the parameter error reduction formula from the nosplit case. The initial mean square parameter error is the parameter risk. Under Formula (3.10), it is split, with each component taking its own parameter variance and the parameter covariance. Since the total covariance portion of the parameter variance is two times the parameter covariance, each component is allocated half of the parameter covariance contribution to the total parameter variance. These allocations are then reduced in proportion to the respective optimal credibility values. There are other ways to allocate the covariances, but under this particular allocation, one arrives at a generalization of Formula (2.8) in which optimal credibility is not only the best weight to use in a linear estimate of the mean, but also it is equal to the percentage reduction in the variance of the estimated mean achieved by using that optimal weight.
4. When does splitting reduce mean square parameter error?
To study whether a split plan reduces minimum mean square parameter error, we first define the reduction in minimal mean square parameter error: Δ(ε_{0}^{2}) = ε_{0}^{2}(NS) − ε_{0}^{2}(SP). In Corollary 1 of Appendix B it is proved that this can be expressed in terms involving the optimal nosplit and split credibilities:
Δε20=τ2(1−z∗)−(τ21+π)(1−z∗1)−(τ22+π)(1−z∗2)=(τ21+π)(z∗1−z∗)+(τ22+π)(z∗2−z∗).
In words, the difference in mean square parameter error is the parameter risk allocated to the first component including its share of the parameter covariance times the difference between the optimal credibility of the first component and the optimal credibility of unsplit losses plus the corresponding term for the second component.
4.1. Comparison of credibilities
We immediately see from Equation (4.1) that error improvement at least requires one of the split plan credibility values to be larger than the credibility from the original nosplit plan. In this generality, where the split is arbitrary and not necessarily between primary and excess losses, there is no reason why the split plan should reduce mean square parameter error.
We will use Equation (4.1) to derive intuitively accessible formulas that summarize what is required for a split to reduce the minimal mean square parameter error. But, before presenting our main results, it is useful to consider a simple example to hone our intuition.
4.1.1. The even split example
Consider an “even split” where the two components have the same process variance and the same parameter variance as seen, for instance, in Exhibit 1 Sheet 2. In the general case of an even split, we have σ_{1}^{2} = σ_{2}^{2}, τ_{1}^{2} = τ_{2}^{2} and π = τ_{1}τ_{2}. It follows that λ_{1}^{2} = λ_{2}^{2} and that z_{1}* = z_{2}*. We derive:
z∗1=λ22(τ21+π)−κ(τ22+π)λ21λ22−κ2=λ21(τ21+π)−κ(τ21+π)(λ21−κ)(λ21+κ)=(τ21+π)(λ21+κ).
z∗=τ2λ2=τ21+τ22+2πλ21+λ22+2κ=(2τ21+2π)(2λ21+2κ)=(τ21+π)(λ21+κ).
Since all the optimal credibilities are equal, it follows from Equation (4.1) that the split plan does not reduce mean square error. Note this result holds no matter what the process covariance is between the components. So, for example, a split where each component is equal to half the loss gains us nothing. Neither does a plan where we toss a fair coin to decide if a claim belongs to one component or the other. Of course, there is no intuitive reason to expect either of these split plans could improve the accuracy of our credibilityweighted estimate of the mean. More generally, our intuition is that a split cannot improve the accuracy of the final estimate if it does not meaningfully use additional information beyond that which was used for the nosplit plan.
4.2. Formula for the difference in minimal mean square error
We are now ready to state several key formulas for the difference in minimal mean square error. The first expresses that difference in terms of the process and parameter variances and covariances:
Δ(ε20)=1Dλ2((τ21+π)(σ22+ρ)−(σ21+ρ)(τ22+π))2.
The proof of this formula is shown in Corollary 2 of Appendix B. The proof also yields the following important formula that expresses the mean square error reduction in terms of the square of difference of the resulting split credibility values:
Δ(ε20)=Dλ2(∗z1−z∗2)2.
This result is Theorem 2 in Appendix B.
4.3. What makes a split effective?
By definition, we regard a split as effective when it improves our estimate of the mean. More precisely, when the mean square error criterion is used to judge the quality of an estimate, an effective split is one that leads to a significant reduction in the least mean square error of our estimate in comparison with that obtained using the unsplit plan. So a split is effective if it produces a relatively large value for the difference in optimal mean square errors, Δ(ε_{0}^{2}).
4.3.1. The most effective split possible
Examining Equation (4.4), we see that the most effective split possible would be one that puts all the process risk in one component and all the parameter risk in the other. With a split that extreme, both the process and parameter covariances will be zero. Thus we would have π = ρ = κ = 0 and it would follow from Equation (3.8) that one component would have credibility of 100% and the other would have credibility of 0%. Further, from Equation (3.10) it would follow that the mean square parameter error of the resulting split credibility estimate would be zero! In other words, our split credibility estimate would be exactly right because it is based on a perfect separation of noise from signal. Exhibit 1 Sheet 3 provides a numerical example.
In any realistic scenario it will be impossible to make such a clean split of the process and parameter risk. However, the intuition still holds. The key for a split to be effective is that it must lead to a proportionately different allocation of the total process and total parameter variances.
4.3.2. Ineffective splits and proportional allocation
We have already seen that an even split is ineffective. This can be generalized using the process and parameter variance allocations from Equation (4.4) as displayed in Table 2.
Each component gets assigned a total process variance equal to the sum of its own process variance plus the process covariance. The parameter variance is allocated the same way. Equation (4.4) implies that if the splitting leads to an allocation where each component has the same ratio of allocated parameter variance to allocated process variance under this particular allocation, then there will be no improvement at all in optimal MSE due to the split. To state this mathematically,
If τ21+πσ21+ρ=τ22+πσ22+ρ, then Δ(ε20)=0.
This is demonstrated with a numerical example in Exhibit 1, Sheet 4.
4.3.3. Different optimal credibility values by component
From Equation (4.5) it follows that an effective split is one that produces optimal credibility values that differ substantially for the two components. One can also argue this makes sense intuitively by reasoning backwards. If a split were to lead to optimal credibility values that were the same for both parts of the split, then one might as well have left them unified and applied a single credibility value to the undivided whole.
4.3.4. The impact of covariance
When comparing two different ways of splitting, the split with the larger process covariance is not necessarily any more or less effective than the other one. Compare the base case, Exhibit 1 Sheet 1, with Exhibit 1 Sheet 5. The split shown in Sheet 5 has a larger process covariance, but it is more effective: it reduces MSE more than the split in Sheet 1. In contrast, the split example in Sheet 6 has the same covariances as the one in Sheet 5, but it is not as effective as the base case split. The conclusion is that the impact of covariance is complicated. The contrary examples in Sheet 5 and Sheet 6 were obtained by adjusting σ_{1}, σ_{2}, and ρ to exacerbate or diminish the differential allocation of process and parameter risk while obeying the overall constraint that “σ” stay fixed in the equation, σ^{2} = σ_{1}^{2} + σ_{2}^{2} + 2ρ. The same nondefinitive result is true for splits with different parameter covariances. Corresponding sets of counterexamples can be readily constructed along the same lines.
4.3.5. Effective split with negative credibility for one component
It is possible to have an effective plan that has a negative credibility value for one component. A particular instance is shown in Exhibit 1, Sheet 7. Such a situation may arise when one split component, the volatile one, is given the lion’s share of the process risk, a very modest share of the parameter risk, and the split produces a parameter covariance roughly as large as the parameter variance of the volatile component. Because the volatile component has so much process risk and so little parameter risk, one would not want to give it any weight. The high parameter covariance allows us to gain more accurate information about the mean of the volatile component from the results of its betterbehaved sister component than we can gain from the results of the volatile component itself. This provides some intuitive justification for how a negative credibility can occur and how a split with such a negative credibility component may nonetheless be effective.
5. Credibility with losses from a single severity type model
Now we will derive credibility formulas for losses that arise from a model in which claim counts are generated by a single random variable and each claim severity is conditionally an independent sample from the single severity distribution. We will further assume, to simplify the discussion, that our uncertainty about severity is confined to lack of precise knowledge of its scale. We refer to such a model as a Single Severity Type Model with Severity Scale Uncertainty. The CRM is an example of such a model.
To begin the mathematical derivation, let
be the number of claims and write for the loss from the ith claim. Assume each is an independent random sample of the severity random variable, Further suppose each is independent of the claim count. Define the actual loss, via: Now suppose is parametrically dependent on a parameter, and that is parametrically dependent on a parameter, Assume and have prior distributions that are independent. We will abuse notation and usually drop the subscripts, and on Define and Then take expectations and variances with respect to the priors to define andWe will now derive the process and parameter variance of loss using terms based on the claim count and claim severity. The conditional mean and variance are given by:
μA(θ)=μN(θ)⋅μX(θ).
σ2A(θ)=μN(θ)⋅σ2X(θ)+σ2N(θ)⋅(μX(θ))2.
Taking expectations with respect to the priors, we find the process and parameter variances:
σ2A=μN⋅σ2X+σ2N⋅(τ2x+μ2X).
τ2A=τ2N⋅τ2X+τ2N⋅μ2X+μ2N⋅τ2X.
Note in Equation (5.3), the expected process variance contains a term that includes the severity parameter variance.
Plugging these into the basic nosplit credibility formula, Equation (2.6), we find the optimal credibility is given as:
z∗=τ2N⋅τ2X+τ2N⋅μ2X+μ2N⋅τ2Xτ2N⋅τ2X+τ2N⋅μ2X+μ2N⋅τ2X+μN⋅σ2X+σ2N⋅(τ2x+μ2X).
If we assume N is conditionally Poisson so that σ*_{N}^{2} = µ_{N}*, the process variance is:
σ2A=μN⋅(σ2X+τ2X+μ2X).
So with conditionally Poisson claim counts, the formula for optimal credibility is given as:
z∗=τ2N⋅τ2X+τ2N⋅μ2X+μ2N⋅τ2Xτ2N⋅τ2X+τ2N⋅μ2X+μ2N⋅τ2X+μN⋅(σ2X+τ2X+μ2X).
5.1. Credibility when losses follow the collective risk model
We will now examine the optimal credibility formula, Equation (5.7), when account loss distributions follow the usual collective risk model. Let
be Poisson with parameter where and Under these assumptions, we have and The parameter, is called the contagion. Let be conditionally exponential with mean where and The parameter, is called the mixing parameter. With this notation we have and It follows that and:σ2A=E[nχ⋅E[X2∣β]]=ns2⋅E[2β2]=2ns2⋅(1+b).
τ2A=Var(nχsβ)=n2s2⋅((1+c)(1+b)−1).
Thus the optimal credibility is given as:
z∗=n2s2⋅((1+c)(1+b)−1)n2s2⋅((1+c)(1+b)−1)+2ns2(1+b)=n2⋅((1+c)(1+b)−1)n2⋅((1+c)(1+b)−1)+2n(1+b).
For a specific numerical example, suppose s = 10, n = 10, b = .25, and c = .20. Then using 5.8 the process variance is 2 ⋅ 10 ⋅ 100 ⋅ 1.25 = 2,500 and applying 5.9 the parameter variance is 100 ⋅ 100 ⋅ (1.25 ⋅ 1.20 − 1) = 5,000. Thus we find the credibility is or about 67%.
6. Split credibility with losses from a single type model
Next we derive comparable split credibility formulas. Given a per occurrence split point, k, and an occurrence of size X, we define X_{p} = min (X, k) as the primary severity and X_{e} = X − min(X, k) as the excess severity. Observe, under this definition X_{e} will have a mass point at zero equal to the probability that X is less than or equal to the split point. In other words, X_{e} is not the conditional excess severity. We have adopted this approach so that the primary, excess, and total losses all have the same claim count distribution. This simplifies some derivations. We now define the actual primary loss, A_{p} = X_{p}(1) + X_{p}(2) + . . . + X_{p}(N) and the actual excess loss, A_{e} = X_{e}(1) + X_{e}(2) + . . . + X_{e}(N). The primary and excess process and parameter variances are given as:
σ2Ap=μN⋅σ2Xp+σ2N⋅(τ2Xp+μ2Xp).
σ2Ae=μN⋅σ2Xe+σ2N⋅(τ2Xe+μ2Xe).
τ2Ap=τ2N⋅τ2Xp+τ2N⋅μ2Xp+μ2N⋅τ2Xp.
τ2Ae=τ2N⋅τ2Xe+τ2N⋅μ2Xe+μ2N⋅τ2Xe.
We can derive the following formulas for the covariances:
ρ=E[Cov(Ap,Ae)]=(σ2N−μN)⋅μXp⋅μXe+k⋅μXe⋅μN.
π=Cov(E[Ap∣θ],E[Ae∣θ])=(τ2N+μ2N)⋅πX+τ2N⋅μXp⋅μXe.
Here π_{X} = Cov(E[X_{p}θ], E[X_{e}θ]) denotes the parameter covariance of the primary and excess severities. The derivations are shown in Appendix C.
Assuming claim counts are conditionally Poisson, the process variance terms simplify to:
σ2Ap=μN⋅(σ2Xp+τ2Xp+μ2Xp).
σ2Ae=μN⋅(σ2Xe+τ2Xe+μ2Xe).
ρ=k⋅μN⋅μXe.
6.1. Split credibility formulas under the collective risk model
We will now apply the CRM structure of priors to evaluate the required terms in the split credibility formulas shown in (3.8). We already have the formulas for the variances of the claims counts. Using the CRM assumption that severity is conditionally exponential, we may write the following formulas for the conditional means of the primary and excess severities.
μXp(θ)=sβ(1−exp(−k/(sβ))μXe(θ)=sβexp(−k/(sβ)).
We can also derive the formulas for the conditional severity process variances:
σ2Xp(θ)=∫k0dxx2⋅(sβ)−1⋅exp(−x/(sβ))+k2exp(−k/(sβ))−s2β2(1−exp(−k/(sβ)))2=s2β2−2sβk⋅exp(−k/(sβ))−s2β2⋅exp(−2k/(sβ)).
σ2Xe(θ)=∫∞kdx(x−k)2⋅(sβ)−1⋅exp(−x/(sβ))−s2β2⋅exp(−2k/(sβ))=2s2β2⋅exp(−k/(sβ))−s2β2⋅exp(−2k/(sβ)).
Now, in conformance with the CRM structure, assume that β is such that γ = 1/β is Gamma distributed. Let γ have shape parameter α and scale parameter λ such that E[γ] = α/λ and Var(γ) = α/λ^{2}. It follows that E[β] = λ/(α − 1) and E[β^{2}] = λ^{2}/{(α − 1)(α − 2)} as shown in 6.13 and 6.14:
E[β]=E[1/γ]=∫∞0dγ1γλαΓ(α)γ(α−1)⋅exp(−λγ)=λα−1∫∞0dγλα−1Γ(α−1)γ(α−1)−1⋅exp(−λγ)=λα−1.
E[β2]=E[1/γ2]=∫∞0dγ1γ2λαΓ(α)γ(α−1)⋅exp(−λγ)=λ2(α−1)(α−2)⋅∫∞0dγλα−2Γ(α−2)γ(α−2)−1⋅exp(−λγ)=λ2(α−1)(α−2).
It also follows that the density of β is given as:
h(β)=λαΓ(α)β−(α+1)⋅exp(−λ/β).
With this density, we can derive the unconditional severities and the process and parameter variances and covariances. To ensure the derivations are clearly understood, we will show the first one in some detail.
μXp=E[μXp(θ)]=E[(sβ)⋅(1−exp(−k/(sβ))]=s∫∞0dββ⋅λαΓ(α)β−(α+1)exp(−λ/β)−s∫∞0dββ⋅λαΓ(α)β−(α+1)exp(−(λ+(k/s))/β)=sλα−1(1−(λλ+k/s)α−1).
This expression is the formula for the limited expected value of a Pareto severity distribution with scale, sλ, and shape parameter, α.
Recall we have also assumed in Section 5.1 that E[β] = 1 and that Var(β) = b. It follows immediately from (6.13) that λ = α − 1 and we can then use (6.14) to show α = 2 + 1/b:
Var(β)=E[β2]−E[β]2=λ2(α−1)(α−2)−λ2(α−1)2=λ(α−2)−1=b⇒λα−2=1+b⇒α−1α−2=1+b⇒α−2+1α−2=1+b⇒1α−2=b⇒α=2+1b.
Using this to substitute into (6.16), we have:
μXp=s(1−(λ(λ+k/s))λ)=s(1−(1+kbs(b+1))−(1+1/b)).
Given the way we have defined X_{e}, it follows that:
μXe=μX−μXp=s(λ(λ+k/s))λ=s(1+kbs(b+1))−(1+1/b).
Using (6.11), (6.12), and (6.17) and applying similar logic, we can derive the following formulas for the severity process and parameter variances:
σ2Xp=E[σ2Xp(θ)]==E[s2β2−2sβk⋅exp(−k/(sβ))−s2β2⋅exp(−2k/(sβ))]=s2⋅(1+b)−2sk(1+kbs(b+1))−(1+1/b)−s2⋅(1+b)(1+2kbs(b+1))−(1/b).
σ2xc=E[σ2xc(θ)]==E[2s2β2⋅exp(−k/(sβ))−s2β2⋅exp(−2k/(sβ))]=2s2⋅(1+b)(1+kbs(b+1))−(1/b)−s2⋅(1+b)(1+2kbs(b+1))−(1/b).
τ2Xp=Var(μXp(θ))=Var(sβ(1−exp(−k/(sβ))))=s2⋅(1+b)(1−2(1+kbs(b+1))−1/b)+(1+2kbs(b+1))−1/b)−s2(1−(1+kbs(b+1))−(1+1/b))2.
τ2Xe=Var(μXe(θ))=Var(sβexp(−k/(sβ)))=s2⋅(1+b)(1+2kbs(b+1))−1/b−s2(1+kbs(b+1))−2(1+1/b).
Finally we turn to the process and parameter covariances of the severity. We can derive:
ρx=E[βskexp(−k/(sβ)−βsexp(−k/(sβ)))⋅βs(1−exp(−k/(sβ)))]=sk(1+kbs(b+1))−(1+1/b)−s2(1+b)(1+kbs(b+1))−1/b+s2(1+b)(1+2kbs(b+1))−1/b.
πx=Cov(βs(1−exp(−k/(sβ)))βsexp(−k/(sβ)))=s2(1+b)((1+kbs(b+1))−1/b)−(1+2kbs(b+1))−1/b)−s2(1−(1+kbs(b+1))−(1+1/b))(1+kbs(b+1))−(1+1/b).
While these formulas look forbidding, they are actually not too difficult to program. In the next Section, we will use the formulas in Section 6 to generate unsplit and split credibilities under the CRM structure.
7. Split credibility results for CRM models
With the formulas derived in Section 6, we have enough to compute split credibilities and the error reduction due to a split under CRM. We will look at examples to illustrate that some of the undesirable behaviors that can exist under an arbitrary split plan can also arise under a primaryexcess split plan operating on an underlying CRM structure. First, recall the example at the end of Section 5.1 in which the mean claim count is 10, the mean severity is 10, the contagion, c, is .250 and the severity mixing parameter, b, is 0.200. In this example, total mean loss is 100 and we had previously seen the optimal unsplit credibility is 67%. If we now introduce a split point of 10, we find, as shown in Exhibit 2, Sheet 1 that the optimal primary and excess credibilities are both 67%. As we know from Equation (4.5), this implies the primaryexcess split is no better than the unsplit plan. We can also see from Exhibit 2, Sheet 1 that this is an example of proportional allocation of process and parameter risk in which the primary layer gets 36% of the process variance and 36% of the parameter variance under the covariance allocation in Table 2. Thus, it also follows from Equation (4.6) that this split is ineffective when the loss model is the CRM with the parameters given.
With other parameters, this same split can be effective. In Exhibit 2, Sheet 2, the assumptions are the same as in Sheet 1 except the mixing parameter is reduced from 0.250 to 0.025. With uncertainty about severity dramatically reduced, the excess layer gets a modest allocation of parameter risk and large allocation of process risk. The opposite is true for the primary layer: it has less process risk and more or the parameter risk allocated to it. This is the scenario under which a primaryexcess split will be effective and behave in the usual way actuaries expect. The primary layer in the example has 92% credibility and the excess layer has only 11%. The introduction of the split reduced optimal MSE by roughly 12% of optimal MSE of the unsplit credibility estimate.
However, under the CRM a choice of parameters that puts more parameter risk in the excess layer can produce an inversion of primary and excess credibilities in a plan which is still effective. In Exhibit 2, Sheet 3, the severity mixing parameter is reset back to 0.250 while the contagion is reduced to 0.020. This results in a primary credibility of only 3% and an excess credibility 72%. Such an inversion of primary and excess credibilities can never happen in the NCCI split rating plan. The split in this example reduces MSE by 9% of the MSE of the unsplit estimate.
Finally in Exhibit 2, Sheet 4 is an example of unusual behavior in which the primary credibility is negative and the excess credibility is positive. This is achieved by reducing mean claim counts and count parameter risk while boosting mean severity and severity risk. The primary layer ends up with a very modest parameter variance, one that is smaller than the parameter covariance. In the example, primary credibility is −33% and the excess credibility is 43%.
7.1. Effectiveness and wellbehaved primaryexcess splits
We say a primaryexcess split plan is effective if it appreciably reduces mean square error and we say it is wellbehaved if it has no primaryexcess credibility inversions and all credibility values are between zero and unity.
The CRM examples show that, with the right set of parameters, there are MSE optimal primaryexcess split credibility plans that are both effective and wellbehaved. Based on the examples, we see this happens when:

There is substantial parameter risk due to claim count uncertainty,

Most process risk is due to volatility of severity,

The split allocates a disproportionate amount of parameter risk to the primary layer

The split allocates a disproportionate amount of process risk to the excess layer.
The CRM structure with its single severity subject to scale parameter uncertainty readily allows parameter selections that satisfy these conditions.
However the CRM examples also show that a primaryexcess split does not have to produce an effective or wellbehaved plan. An effective plan with inversions can results when severity parameter risk drives the overall parameter risk and the split puts a relatively large amount of parameter risk in the excess layer. In such a scenario, the effect of a split intuitively is to deprive the primary layer of much of the information about severity and thus to diminish its allocation of parameter risk and therefore diminish the primary layer credibility. There is nothing in the structure of the model to prevent the primary layer credibility from falling below the excess layer credibility. From an a priori perspective, there is nothing anomalous or bizarre about such scenarios. Why can’t we have a fairly small uncertainty about mean claim counts and larger relative uncertainty about mean severity, as in Exhibit 2, Sheet 3? From this perspective, middle scenarios in which split credibility is wellbehaved, but only modestly effective, also seem quite reasonable.
8. Conclusion
To summarize, we have shown that analysis of split experience rating requires analysis of the allocation of the process and parameter risk to the primary and excess layers and to the covariances between the layers. We stressed it is insufficient to focus on volatility alone: low volatility is not synonymous with high credibility. We have derived formulas for the mean square parameter errors in the unsplit and split plans. We showed that credibility is the ratio by which parameter risk is reduced in an optimal estimate as well the weight given to experience versus prior belief in the linear estimation formula. We then extended that error reduction interpretation to apply in a split credibility context.
By taking differences and simplifying, we found a formula for the reduction in mean square parameter error attributable to splitting. Interpreting this formula, we found that mean square error of the estimate would be effectively reduced if the split produced a differential allocation of process and parameter risk. We also saw that credibility for one component had to be bigger and credibility for the other component smaller than the credibility for the unsplit total in order for there to be any error reduction from splitting.
We have shown with examples using the standard CRM that a primaryexcess split does not always improve accuracy to any great degree, nor does it always produce nonnegative credibility values or a primary layer credibility that is bigger than the excess layer credibility. We saw that these disquieting results were not an artifact of making odd parameter choices but were inherent in the nature of the primaryexcess splitting process operating on plausible models with reasonable parameter selections.
To summarize, this work has established a solid mathematical foundation for split credibility. However, it has also highlighted potential weakness: under a standard model such as the CRM, primaryexcess splitting does not automatically confer a great advantage; nor does it produce wellbehaved credibility values. It is possible a model having different types of claims with different severities or other more complex severity parameter risk structure might allow a more effective split of noise from signal while at the same time preventing inversions and negative credibility values. More support for splitting might also be found by investigating optimality criteria different from minimal mean square error, or perhaps by studying use of optimal credibilities subject to constraints that promote good behavior. Perhaps most promising is to carry the error analysis further to a plan that has a split on capped losses and also uses Mod extension to estimate total uncapped losses. This is the actual type of plan used by the NCCI. Work along these or similar lines might provide a stronger conceptual foundation for the use of split credibility.
Acknowledgments
The author gratefully acknowledges David Clark, Robin Gillam, and Gary Venter for their insights and observations. The members of the review committee also merit commendation for their detailed and thoughtful comments. These contributions helped to motivate and substantially improve the paper.
Disclaimers
The opinions expressed are solely those of the author and are not presented as a statement of the views or practices of any past or present employer or client of the author. The author assumes no liability whatsoever for any damages that may result directly or indirectly from use or reliance on any observation, opinion, idea, or method presented in this paper.
Abbreviations and Notations

b, severity mixing parameter

c, contagion

CRM, Collective Risk Model

CV, Coefficient of Variation

Experience mod, Experience modification factor

Δ(ε_{0}^{2}), reduction in minimal mean square estimation error

κ, total covariance

µ, the mean

NCCI, National Council on Compensation Insurance

NS, NonSplit

π, parameter covariance

ρ, process covariance

σ, process standard deviation

SP, Split

τ, parameter standard deviation

z*, optimal credibility