1. Introduction
It is well established that the limited fluctuation or “square root” credibility has limitations. Since it is designed to produce stable estimates, not best estimates, it does not provide the most accurate rates. Further, since any conceivable combination of the fluctuation that may be acceptable and probability of a chance violation of the accepted fluctuation is a priori no different than any other, it is challenging[1] to show that any particular full credibility standard is better than any other. Lastly, the square root rule relies on an assumption that the statistic receiving the complement of credibility is stable. When the complement of credibility is, say, three years of 15% trend, that assumption is clearly violated. So there is a strong need[2] for best-estimate credibility.
Some time ago (1967) Hans Bühlmann developed a formula[3] for the best estimate credibility of a single risk or a single class when the complement of credibility is assigned to the large group that the risk or class is part of. His P/(P + K) formula[4] is well known and represents a truly optimal (in the sense of making the best predictions) credibility formula. But a formula is also needed for the credibility of the overall rate change for a product or line of business. It is quite common in actuarial work to develop a rate indication for such a group, realize that supplemental data is needed, and credibility weight the overall indicated change with something such as the inflationary trend since the last rate change.[5] Considering that the overall rate change affects every rate for every class and every risk, this author believes that the credibility of the overall rate indication deserves as much attention as the credibility of the class data within it.
A solid theoretical background has been laid for the credibility of this overall rate indication. Credibility is by nature a process that is designed to update an estimate of loss costs. A paper by Jones and Gerber (1975) provides formulas for the weights in updating formulas (to be discussed later) in terms of the covariances of the historical data points.[6] This formula, in fact, provides the optimum linear estimate of future costs given all the prior data, not just the data used in the current rate update.
Nevertheless, knowing the mathematical form of the credibility is not the same thing as being able to compute the credibility. As will be shown, standard credibility formulas derived from the Gerber-Jones approach use values for the Brownian motion variance in year-to-year trend, plus values for the “observation error” variances between observed data points and the true expected costs that underlie them.[7] To compute the credibility, it is necessary to estimate those variance parameters. This paper provides techniques designed to do just that.
2. The theory—Key credibility formulas for the overall rate indication
In this section the key theoretical results from the Jones and Gerber (1975) paper are presented. This should provide the practitioner a summary of the key formulas that create best-estimate credibility. Likely none of the material is new.
2.1. The general Gerber-Jones formulas
The goal is to apply the Gerber-Jones formulas to a realistic model (ultimately, geometric Brownian motion for trend, and observation error with a constant coefficient of variation) of the relationship between historical data and the unknown future loss cost. So, to facilitate the reader’s understanding, the key Gerber-Jones formulas are shown below.
The first statement that must be made is that the Gerber-Jones formula, and, unless stated otherwise, all other formulas, assume that any necessary trend and current level adjustments have already been made to the data. For example, although the prior data used in a credibility formula involves trending and current level adjustments, those adjustments are assumed to have been done[8] in the background, so all that is involved is determining the optimum credibility weights for the previous years.
With that background, a credibility formula[9] and data pattern is of the updating type[10] through the n + 1st projection (e.g., the optimum[11] estimate of future loss costs is a credibility weighted average = ZnSn + (1 − Zn)Pn of the previous estimate of loss costs Pn and the new data Sn) if there is a constant μ and sequences V1, V2, . . . , Vn and W1, W2, . . . , Wn such that
E[Si]=μ for each of the Si
Cov[Si,Sj]=Vi+Wifor each case where i=j,and
=Wi if i<j
Further, when the credibility formula and data pattern are of that updating type, then the optimum credibilities are
Zi=Wi−Wi−1+Zi−1Vi−1Wi−Wi−1+Zi−1Vi−1+Vi
and
Z1=W1W1+V1
2.2. The linear updating-type formulas
As a first step towards understanding the notation, it is helpful to introduce the credibility under a standard linear Brownian motion with a drift (T), variance parameter “δ2” for the Brownian motion, and a constant error variance “σ2” between each trended data point Si = Si* + (n + 1 − i)T and the trended underlying expected cost at period i, or Li = Li* + (n + 1 − i)T. Logically, the actual deviations from the expected loss (Si − Li = Ei per this linear model) could be expected to be independent from both each other and the Li’s. Of note, this treatment is not new, but is presented so that the reader may understand the process.
Then, if we take “μ” to be the true mean expected loss[12] at time[13] n + 1, μ = = then the underlying prior expected loss follows a Brownian motion. Further, since Cov[A + αB, C + βB] = αβVar[B] when A, B, and C are mutually independent,
Cov[Si,Sj]=Cov[Li+Ei,Li+(Lj−Li)+Ej]=Var[Li]=iδ2(i<j)
(noting that Lj is further along in the Brownian motion than Li, the random motion between Li and Lj is independent of Li). Further,
Cov[Si,Si]=iδ2+σ2
So, in the Gerber-Jones formula
Vi=σ2; and
Wi=iδ2
Hence, per formula (2.4),
Zi=δ2+Zi−1σ2δ2+Zi−1σ2+σ2
where each Zi is the optimum credibility to use when combining the new data (Si) with the prior estimate (Pi) to produce the optimum estimate
of Further, the resulting combination of all the prior data points that represents is the optimum estimate of given the available data.Jones and Gerber (1975) also show that the successive Zi’s converge to a limit (which could conceivably be used as a proxy for the credibility Zi when i is large). In this scenario, setting Zi = in formula (2.10) and solving for Zi gives
Z=δ2(√1+4σ2δ2−1)2σ2.
2.3. The geometric Brownian motion formulas
The linear model has a key weakness—it assumes that the growth in losses is linear. In fact, it is well-established that most insurance lines of business suffer inflation that causes loss costs to grow exponentially rather than linearly. That reality requires an adjustment to the Brownian motion model. Instead of having
= 0 for each i, we should expect zero growth = 1). Instead of expecting the s to have identical and independent normal distributions, one would expect the s to have independent identical lognormal distributions, with the aforementioned mean of unity and some common variance of δ2. So if one begins with unadjusted data points, each denoted as Si*, the points used to estimate = are the inflated values = Si’s.Lastly, a model for the differences between the observed Si’s and the true expected costs, the Li’s, must be included. In this model, the ratios Si/Li are assumed to have independent, identical lognormal distributions with a mean of unity and a constant variance of σ2. These distributions are also expected to be independent from those of the year-to-year drifts
s). The common observation variance of the trended values assumption is consistent with roughly equal numbers of claims from year to year with severity inflation affecting the loss sizes. It would be less proper for an increasing book of business that encompasses more and more expected claims from year to year with consequent reductions in the coefficient of variation of the process variance.In any event, the covariance structure, using the identity[14] Cov[AB, CB] = E[A] × E[C] × Var[B], is[15]
Cov[Si,Sj]=Cov[Li×Ei,Li×(Lj/Li)×Ej]=E[Lj]×E[Lj−1/Lj]×,…,×E[Li/Li∓1]×E[Ei]×Var[Li](i>j)=1×1×,…,×1×Var[Li]=(δ2+1)1−1(i>j)
Further, by the identity Var[AB] − Var[A]Var[B] + E[A]2Var[B] + E[B]2Var[A],
Cov[Si,Si]=σ2(δ2+1)i+(δ2+1)i−1
So, the key values for the Gerber-Jones formula in this case are
Wi=(δ2+1)i−1;
Vi=σ2(δ2+1)i
Zi=δ2+Zi−1σ2δ2+δ2σ2+Zi−1σ2+σ2
A comparison to equation (2.10) shows that this is identical to the formula for the linear case, except for the additional δ2σ2 term in the denominator. But, one should consider that when at least one of the values δ2 and σ2 is very small, the combination term δ2σ2 should be a small part of the denominator. Thus, one might say that, for the case of geometric Brownian motion,
Zi≅δ2+Zi−1σ2δ2+Zi−1σ2+σ2
Further, the steady-state credibility may be approximated as
Z≅δ2(√1+4σ2δ2−1)2σ2
As a relevant side note, the summands involved in equations (2.13) and (2.14) would inflate uniformly as the losses are projected ahead more than one year, to some n + Δt instead of to time n + 1, and the credibility equation would remain unchanged.[16]
3. Multi-year formulas and best estimate credibility for the overall rate indication
The approach outlined earlier involves updating a rate with a single new year of data. But it is very common to see rate indications that update a rate with, say, the weighted average of the data from the last five years. The role of this multi-year data in a best estimate credibility formula merits discussion.
3.1. Reasons not to reuse older years
Updating formulas that use multiple years reuse data from prior estimates. So, the reuse of data should be evaluated. The first point to be made is that using multiple years is perfectly appropriate when limited fluctuation credibility is involved. Limited fluctuation credibility deals solely with the extent to which the body of data receiving credibility can be relied on to not create unwarranted increases or decreases of some specified size. It does not purport to create a best estimate of the future costs. It has been stated, though, by the well-respected Howard Mahler in 1986 that this method often produces future loss estimates that are comparable to those of best estimate credibility.
To state it simply, re-using prior years in a Gerber-Jones formula unduly complicates the computations. For example, assume an estimate has been continually updated over 14 years from P1 and S1 to P15 with rolling five-year averages[17] Q1, . . . , Q14 of the data points S1, . . . , S14. Logically, the step is to produce the estimate P16 using Q15. Note, though, that the covariance between Q15 and Q14 is fairly high, since they have the points S11, S12, S13, and S14 in common. However, Q15 and Q1 have no common components.[18] Generally,[19] Cov[Q15, Q14] ≠ Cov[Q15, Q1]. Therefore, the Gerber-Jones formula cannot be used when multiple years are combined.[20] Therefore, the practice of combining multiple years of data in this context is suboptimal.
That conclusion has a very relevant corollary. If the exposures most useful for limited fluctuation credibility stem from five or even ten years, but best estimate credibility is only based on the most recent year, the resulting credibilities should by nature be different. Therefore, there are circumstances where limited fluctuation credibility is not a good substitute for best estimate credibility.
3.2. Correcting the prior estimate for changes in ultimate loss estimates
There is, however, one respect in which the use of multiple years could improve the estimate. The existing rate is based on the data available earlier, when the various years’ losses were less mature than they are at the time of the updated rate indication. So, it makes sense to update the existing rate for the additional development before using it in the credibility formula. Of course, the existing rate is a multiple credibility weighted average of many years. Further, it is not just an average of many years of loss ratios or pure premiums, it is rather either an average of trended loss ratios brought to the current rate level or trended pure premiums. So, some calculations must be done to include this additional loss development in the prior rate that is used as the complement of credibility. Due to the requirement to use current level data, the correction process for loss ratio ratemaking is slightly more complex than that of pure premium ratemaking. Therefore, Table 1 shows how the calculations needed to update a loss ratio at present rates for loss development might flow.
The references to “Prior” and “Last Prior” refer to the data used in computing the loss ratio estimate that was used in the last rate change. The “First Assigned” values refer to what was used the first time the specific year of data was used. Also, note that although the loss ratios of many years are likely embedded in the prior loss ratio, only the last five were revised. That is because more mature years see fewer year-to-year revisions in ultimate losses, and contribute a diminishing portion after credibility (see column 7).
It is also worth mentioning that in this example the current level factors could be updated for the next rate review by simply multiplying column (4) by unity plus item “C”. Similar adjustments could be made for the “Credibility in Last Prior” and “Total Trend Factor in Last Prior” columns.
Of course, this example mirrors the calculations in the theoretical literature—the data is assumed to be collected at midnight of December 31, 2011, then used to make rates that are effective at 12:01 a.m. of January 1, 2012. However, the corrections needed to reflect practical realities would appear to be straightforward.
3.3. Updated ultimate losses and updating-type credibility
It could be expected that the process of updating prior year ultimate losses could distort the optimum credibility. In lines such as excess casualty reinsurance, the ultimate loss estimates Sn,
etc., for the most recent years could have a very high observation error, and those five or so years back could be much closer estimates of the true expected loss Li’s within their respective years. So, on that basis the true optimum credibility could be expected to be larger for some of the “older” years than the most recent year. However, that would clearly not create an “update.”Some perspective can be provided about this situation. First, when prior year estimates are not corrected, the formulas of section 2 do provide the optimum credibility. Further, updating the prior year ultimate losses can only be expected to improve the accuracy of the resulting loss prediction. So, this approach can be expected to produce a high quality estimate of future costs, up to any distortion due to lengthy loss development.
If loss development uncertainty is expected to significantly distort the credibility, it may well be preferable to simply start from scratch each year with the ultimate loss estimates for, say, the last twenty years. One may then compute estimates of the process variance in each year, estimates of the loss development error variance in each year, and the Brownian motion-type variance parameter.[21] It is not difficult to see that, under the linear model (possibly the geometric as well), an updating formula can be derived for the assignment of weights to the various years. It should be clear that the resulting credibility weights may differ greatly between years. However, it does not involve the sort of updating of the prior rate that is part of the typical actuarial application. Rather it involves simply computing a rate from scratch.[22] Since the focus of this paper is on updating an existing rate with new data, this situation will not be analyzed further in this paper.
4. Estimating the parameters: Z, K, B, δ2 and σ2
The section will give the reader some tools for creating estimates of the key variances, and thus help create better loss cost projections. It is not intended to be a survey on the subject. Rather it is intended to give the practitioner the tools needed to implement best estimate ratemaking. The interested reader may review some of the ideas in De Vlyder 1981 and Hayne 1985, to get two other perspectives on this subject.
First, a few quick notes are in order:
-
Note 1. In many situations, it is not necessary to estimate both δ2 and σ2. Key formulas can be converted to a function of K = δ2/σ2, so K is all one needs to estimate.
-
Note 2. When estimating δ2 and σ2 for geometric Brownian motion, note that they are functions of δ′2 and σ′2 from the logarithmic transform to a linear Brownian motion, exp(δ′2) − 1 = δ2, and exp(σ′2) − 1 = σ2. So, once one determines how to estimate the constants of variance (or even just their ratio) in a linear Brownian motion, one may estimate the credibility for the geometric Brownian motion.
-
Note 3. The observation errors (with variance σ2) consist logically of a combination of the sample variance (i.e., the limitations of the law of large numbers due to the high skew in insurance statistics and inability of “small” claim samples to fully estimate the true expected losses each year) and the loss development uncertainty between the early data we base our projections on and the final actual claims costs in each year. Further, the sample variance and development variance are independent and so may be added to determine σ2.
-
Note 4. (Subtraction of Two Estimated Quantities) If we subtract one highly uncertain “large” number from another “large” number, and the difference is “small,” the result has a “large” variance most of the time. When estimating a small number, that “large” variance typically overwhelms the true “small” value one seeks to estimate.
-
Note 5. (Common Additive Error in all the Data) If all the historical data points are affected equally and simultaneously by a common error that is independent of all the other error terms (for example, all the data is biased by addition of a single, uniform, unknown, amount “” from some distribution with a zero mean), then the optimal solution may be estimated by disregarding this error. Logically, this may be converted algebraically to a situation where one is estimating a future value that contains , with removed from all the historical data. Since the variance of is independent of all aspects of variance in the historical data, the component of the costs being predicted is not susceptible to estimation using the historical data. Hence, it may be disregarded in optimizing the estimate of future costs. A similar result holds when is a constant error multiplier with a mean of one within the data, except that one must consider that the mean of the inverse of may not be unity.
With those concerns in mind, a few methods for estimating the key parameters follow.
4.1. Method 1: The credibility that would have worked in the past.
This approach actually involves no estimation of δ2 or σ2; rather, it estimates Z directly. Since estimating Z directly removes the barriers to implementing best estimate credibility for the overall rate indication, it merits discussion (even though it does not involve δ2 and σ2). The basic methodology involves assuming some credibility value Z, then using all the data but the last year to estimate the last year given. Assume that one has, say, ten years of on-level, appropriately trended[23] loss ratios. Then, one could note that the fifth year’s value could be estimating by first applying some unknown credibility factor Z to the fourth[24] year’s data, Z(1 − Z) to the third year’s data, Z(1 − Z)2 to the second year’s data, etc., then dividing by the sum of the credibilities, 1 − (1 − Z)4, to correct for the off-balance. In effect, a single credibility value is assumed to have been proper for all four updates.
Once that equation is established, one could vary Z in order to find which Z minimizes the squared difference between the fifth year’s data and the credibility-weighted average. Most modern spreadsheet programs contain solution-generating capabilities that make it straightforward to find such a solution. Then, one may also construct similar equations to solve for a common credibility of Z that use the first five values to predict the sixth, the first six values to predict the seventh, etc. The last step involves replacing the individual solutions of Z that each minimize the squared error of a single predictive step with a solution of a single Z that minimizes the sum of all the squared errors of all the predictive steps simultaneously.
The resulting Z is arguably the best estimator of the credibility in the data, at least as long as a single credibility is appropriate for all the years.
Table 2 illustrates how this process would work with ten years of essentially random sample data. The shaded boxes show the inputs and outputs to the solution process (note that the “Target” box pulls up the “Target” value computed at the bottom of the spreadsheet).
This method has good utility as long as δ2 and σ2 are stable over time and the data is not prone to very rare large losses.[25] It is reasonable to expect δ2 to be stable as long as the average trend factor is stable, but often that does not occur. Further, it would be reasonable to expect σ2 to be fairly stable as long as the premium volume in the line, adjusted for trend, is stable.
What must be said. This approach has nothing to do with the formulas stated earlier. However, it does address the key question in this paper, determining the optimum credibility. Further, since Z has a formula in δ2 and σ2, it may also used to determine a second variance constant once a first variance constant is known. Then, one might possibly revise the estimate of σ2 (derived from Z and δ2) to better account for process variance due to large losses, and consequentially revise the estimate of Z.
4.2. Method 2: Fitting K and B across a large number of similar datasets
In this case, one might assume that the ratemaker is computing rates for a single line of business in 50 U.S. states, or some other situation where there is a fairly large number of segments, and all the segments have approximately the same trend and observation- error-variance-per-unit-of-exposure characteristics. One would also have to assume that the complement of credibility is still supposed to be assigned to the existing rate plus trend, not some amalgam of all the segments. One must also assume that the old premium/exposure and loss data using in pricing the last, say, twelve years of rates are available for each of the segments. And lastly, it would help if the second-to-last data point for each segment, possibly the last data point, is developed enough that each value
for each class (s) is as close an estimate of the expected costs as is reasonably possible.Just like the estimation of Z in the previous subsection, K and B may be estimated from the data by solving for the values that would produce the best estimates of the most recent costs in the various segments. In the previous subsection the total squared differences between the credibility-weighted average of various sets of years and the future years they project were minimized. In this case, for each segment “s,” one must construct the credibility-weighted average Pn,s of the last n (= 10, or 5, or whatever is most feasible) years of data (the Si,s’s) in order to estimate each
In doing so, the credibilities should be computed using formula (A.7)Zi,s≅Ui,s+Zi−1,s(K+BUi,s)Ui,s+(1+Zi−1,s)(K+BUi,s).
Per the solution routine, K and B should then be modified so that the squared errors the resulting
s make in estimating the s are minimized. Crucially, K and B are not to vary from segment to segment. Rather, a single pair of K and B that minimize the sum of all the squared prediction errors is to be found via the solution algorithm.So the weight assigned to the year n − i data for the line s data,
isMn−i,s=(1−Zn,s)(1−Zn−1,s)…(1−Zn−i+1,s)Zn−i,s
The resulting predictions[26] of the s are then the various values of
Pn+1,s=n∑i=1Mi,sSi,s+n∏i=1(1−Zi,s)S0.s
(where each
represents the rate or rating information in effect just before the experience period).As before, the sum across all the s’s of the squared estimating errors s(Pn,s − Ln,s)2, or perhaps a premium or exposure weighted average s Wn,s(Pn,s − Ln,s)2 could be computed in the spreadsheet. The resulting value could be called the “Target” and the solution routine or feature could be used to vary K and B until the lowest value of the “Target” is found.
A sample spreadsheet illustrating this approach with 12 data segments and common trend, process, and parameter variance constants, but different samples from those constants among the segments, is shown in Table 3. The expected loss ratios for each segment were simulated using a geometric Brownian motions with the variance specified in Part 1. The actual loss ratios are also affected by the parameter variance and the process variance (a common factor, divided by the premium per the Law of Large Numbers) listed there. The actual values of K and B are on the very left of Part 1. Lastly, the K and B values that minimize the sum of premium-weighted sum of squared errors in projecting the sixth year’s simulated value (using the credibility weights[27] defined by K, B, and the premium data) are highlighted in gray.
Note that the loss ratios for year 1 were deemed to have projection errors similar to the rate prior to the experience period, so they were used for the
s.What must be said. In testing this method, it appears that it may require a substantial number of data points to reliably estimate of K and B using this process. In particular, twelve classes do not appear to be sufficient for the test case above. However, the fact that K and B are combined as K + BU in the equation means that they act together to impact the credibility. The only difference is that the “B” term reacts to exposure or premium volume, whereas “K” does not. In this case, at a premium of about 20 the estimated value of K + BU is about equal to the true underlying value.
Next, the actual quality of the estimation, the errors in estimating the true (unaffected by process or parameter variance) expected loss ratios for year 6 (as shown at the top of Part 7) were computed. As one may see, the difference between the prediction error using the estimated K and B and the actual K and B is negligible. This suggests that, as long as the sample size (number of “s” values) is small and the difference in premiums, exposures, etc., is small, it may be more helpful to simply replace “K + BU” with “K” in the credibility formula.
4.3. Method 3: Estimating δ2 and σ2 from the historical data
This method involves using different linear combinations of squared differences between values. As such, it is oriented towards standard, linear, Brownian motion. However, note that the logs of values from a geometric Brownian motion form a linear Brownian motion. So, one may convert geometric Brownian motion data to linear data, estimate the values of δ2 and σ2 that work in the linear context, then convert those to comparable drift variance and process/parameter variance values. For example, the geometric Brownian motion variance parameter would be
when δ2 is the variance in the corresponding linear Brownian motion and the mean of the geometric Brownian motion steps is specified to be unity (no change in the multiplicative context).So, the goal is to find functions of the Si’s that provide insight into the values of δ2 and σ2. For example, the squared difference between the beginning and ending values (Sn − S1)2 reflects two samples of parameter/process error at the two endpoints and n − 1 samples from the Brownian motion variance. So, if the two types of variance are similarly sized, the squared difference between the two endpoints should be dominated by a multiple of the Brownian motion variance δ2. Similarly, if one adds the squared differences between adjacent points [28] variance σ2. Further, one might expect that more precise approximations might be made by using linear combinations of those two values.
one would expect the result to be dominated by a multiple of the processSo, one might begin by computing the expected values of (Sn − S1)2 and
First, note that, since the mean expected change in values from the Brownian motion (after trend correction) is zero, and the expected process risk is zero.E[(Sn−S1)2]=Var[Sn−S1]
However, Sn − S1 may be expressed as a sum of independent variables, each with mean zero, as (Sn − Ln) + (Ln − L1) + (L1 − S1). So, it is composed of a process error, a Brownian motion of length n − 1, and the negative of a process error. Therefore,
E[(Sn−S1)2]=Var[Sn−Ln]+Var[Ln−L1]+Var[L1−S1]=σ2+(n−1)δ2+σ2=(n−1)δ2+2σ2
Similarly,
E[n−1∑i=1(Si+1−Si)2]=n−1∑i=1Var[Li+1−Li]+2n−1∑i=2Var[Si−Li]+Var[Sn−Ln]+Var[S1−L1]=(n−1)δ2+(n−2)σ2+σ2+σ2=(n−1)δ2+2(n−1)σ2.
Knowing those values, it is possible to construct estimators for δ2 and σ2. One may readily see that, by the linearity of expectations,
E[∑n−1i=1(Si+1−Si)2−(Sn−S1)2]2(n−2)=σ2,
and
E[(n−1)(Sn−S1)2−∑n−1i=1(Si+1−Si)2(n−1)(n−2)]=(n−1){(n−1)δ2+2σ2}−(n−1)δ2+2(n−1)σ2(n−1)(n−2)=δ2.
So, by creatively using the differences between the first and last point, and the differences between adjacent points, one may estimate the values of δ2 and σ2.
An example of the use of equations (4.7) and (4.8) is shown in Table 4. The actual observable data over 15 years in column 2 was generated randomly over 15 years, using the actual values δ = 3% and σ = 7%. The values of δ2 and σ2 were then estimated from the data. As one may see, the estimates are fairly close. But they nonetheless significantly overestimate the credibility.
A note about trend—The theory underlying this paper assumes that the expected loss, a priori, is the same for all years. That generally requires that historical losses have been trended (and premiums adjusted to the current rate and exposure level) before the calculations commence. Of course, if the trend is computed using the same data as the calculations, the calculated value of δ2 may be suppressed. For example, if the random movement began with a large upward jump early in the period, and another jump later, because the value of δ2 is high, the analysis of trend may incorrectly infer that it is high trend rather than a high Brownian motion variance. Of course, if the trend is clearly much larger than δ2, it may well be less of an issue.
Further, as noted later, the problem of estimating δ2 and σ2 is relatively ill-conditioned.[29] So reducing the degrees of freedom of the approximation by estimating trend simultaneously, given a small number of data points, may not be reliable. However, one might be advised to use some related data, such as calendar year reported loss frequency and calendar year closed claim severity, to estimate the trend. On the other hand, if there are a large number of data points relative[30] to the volatility in the data, then the impact of the random observation error in the initial and ending points on the trend estimate should be minimal.
A third aspect of trend deserves mention as well. Without a correction, the random lognormal aspect of geometric Brownian would produce a mean above one at all points after it begins. In effect, the randomness of the distribution combined with the skew of the lognormal tends to generate its own trend. So, the transformed (into a linear version) version of the data points, rather than having a normal-type[31] distribution with mean zero, must have a lognormal distribution with mean That means that external trend must often be corrected, especially trend computed by averaging several year-to-year growth rates. To complicate matters, δ2 is then unknown, so the value needed for the correction is unknown. However, some crude initial estimate of the value of δ2 may be used when estimating trend, and then, once the trend is estimated, the δ2 estimate may be refined, etc. The process may be continued iteratively until a consistent trend and δ2 are computed. Consider that if the estimate is produced by loglinear regression of data with similar geometric
Brownian motion variance,
should already be subsumed into the trend. Further, if quality surrogate data is available for trending, that option deserves serious consideration.What must be said. There are some special considerations that should help explain why the approximations are not more precise. First, it may be difficult to distinguish say, whether a very high last point is due to a very high uptick in the Brownian motion because δ2 is large, or a large process error because σ2 is high. So, the basic problem of approximating δ2 and σ2 may often be ill-conditioned. Second, it is important to review Note 4 at the beginning of this section. At its core, Note 4 says that the error variance in computing the quantities above could be as much as the sum of the variances of the two items you are subtracting. While the error does not quite reach the sum of the variances (due to inter-correlation of the two quantities), one should still be extremely cautious if the difference (the estimate of δ2 or σ2) is much smaller than each of the values involved in the subtraction.
Nevertheless, even though the credibility determined using this method sometimes only has moderate precision, it is moderately close to the “best estimate” credibility. Therefore, it still has the potential to create more accurate estimates than the stability-centered classical credibility.
4.4. Method 4: Estimating σ2 structurally from loss data and δ2 by subtraction
Given the formulas in equations (4.5) and (4.6), it is clear that, once one of δ2 and σ2 is reliably estimated, the other may be estimated. It should also be clear that equation (4.5) has relatively more content in δ2 than equation (4.6). So, if one has a quality estimate of σ2, the formula
δ2≅(Sn−S1)2−2σ2n−1
may be used to estimate δ2.
Some estimate of σ2 is required to use that formula, though. One method for estimating σ2 involves what may be described as a structural analysis. Such a process involves decomposing the process/parameter risk into its components and then estimating each component separately.
The process risk is some ways better represented in historical credibility formulas (such as the P/(P + K), or U/(U + K) in the notation of this paper), so it will be analyzed first. Thankfully, as long as there are enough claims in the data to reliably estimate the upper end of the severity distribution, one may use the collective risk equation to calculate the process variance (which may be labeled “α2”). Then,
α2=E[# claims]×Var[severity]+Var[# claims]×E[severity]
or in the loss ratio or pure premium context,
E[# claims]×Var[severity]+α2=Var[# claims]×E[severity]( premium or exposures )2.
So, as long as the proper data is available,[32] the process variance is readily estimable.
The other portion that must be estimated is the parameter variance, which will similarly be denoted “β2”. Note that any year-to-year variations in the trend are subsumed into δ2. So, in most cases the only parameter-type variance that need be considered is the uncertainty in loss development to ultimate. That variance has two parts: uncertainty about what the correct expected loss development factor is; and variance of the ultimate loss in each year, as estimated using loss development, around the actual ultimate loss.
It is not hard to see that the uncertainty about the expected loss development factor can be essentially ignored per Note 5 at the beginning of this section. The variance in future loss emergence[33] on the various years requires some analysis, though. Estimating the remaining random β2, given appropriate volume in the triangle, can be done using some fairly well established procedures. For example, a paper by Hayne (1985) details one approach. The result of this approach would be a multiplicative distribution with a mean of unity and a variance of some β2.
Of course, it is then necessary to combine α2 and β2. First, α2 should be converted to a multiplicative distribution to use with the multiplicative loss development distribution. Such a distribution would represent the ratio
which has a mean of one and variance The multiplicative combination of these two clearly independent distributions gives(Variance of processparameter variance in geometric Brownian motion space)=β2+α2(expected loss)2+α2β2(expected loss)2.
So, when that is converted to a parameter in the linear model[34], one may show that
\alpha^{2}=\log \binom{\beta^{2}+\frac{\alpha^{2}}{(expected\ loss)^{2}}}{+\frac{\alpha^{2} \beta^{2}}{(expected\ loss)^{2}}+1} . \tag{4.13}
Then, that estimate may be combined with equation (4.9) to obtain an estimate of δ2.
4.5. Method 5: Estimating δ2 using a larger dataset and σ2 by subtraction
Just as σ2 may be estimated using alternate approaches, δ2 may often be estimated in isolation as well. If a larger proxy dataset (for example, the countrywide private passenger auto experience of a major carrier when rates are being made a low volume state) is available, and that dataset has very minimal process/parameter risk, then the formula (4.8) from subsection 4.3 should produce a very high quality estimate of δ2. Then, using equation (4.6), σ2 may be estimated via
\frac{\sum_{i=1}^{n-1}\left(S_{i+1}-S_{i}\right)^{2}}{2(n-1)}-\frac{\delta^{2}}{2} \cong \sigma^{2} \tag{4.14}
4.6. All or many of the above
Several methods were presented above. They all have different strengths and weaknesses. Whenever possible, it may be helpful to review the results of more than one method. Note that that the credibility formula is not a formula in σ2 and δ2 per se, it is actually a formula in either the ratio
or in K and B. So, when different values for σ2 and δ2 result from different approaches, but the ratio K is similar, the methods fundamentally agree. Also, note that what may look like large changes in K may have a very minor effect on the credibility when K is very large. Lastly, should the methods disagree it creates an opportunity to evaluate the strengths and weaknesses of each one.Summary
The “square root” or classical credibility process has been in use for many years. Nevertheless, that method has significant a flaw in that the statistical assumptions (confidence level and failure threshold) may be chosen arbitrarily. Further, it assumes that whatever data receives the complement of credibility is stable and reliable, even when that data is, say, four years of a 20% trend rate. It is hoped that this advancement, by providing a reliable credibility process that uses minimal assumptions, will restructure the credibility processes used by casualty actuaries. Then, the profession can be comfortable that rate indications that use the resulting credibility values are as accurate as possible.