1. Introduction
Since the late 1980s actuarial science has been moving gradually from methods to models. The movement was made possible by personal computers; it was made necessary by insurance competition. Actuaries, who used to be known for fitting curves and extrapolating from data, are now likely to be fitting models and explaining data.
Statistical modeling seeks to explain a block of observed data as a function of known variables. The model makes it possible to predict what will be observed from new instances of those variables. However, rarely, if ever, does the function perfectly explain the data. A successful model, as well as seeming reasonable, should explain the observations tolerably well. So the deterministic model y = f(X) gives way to the approximation y ≈ f(X), which is restored to equality with the addition of a random error term: y = f(X) + e. The simplest model is the homoskedastic linear statistical model (LSM) with vector form y = Xβ + e, in which E[e] = 0 and Var[e] = σ2I. According to LSM theory, β̂ = (X′X)–1X′y is the best linear unbiased estimator (BLUE) of β, even if the elements of error vector e are not normally distributed.[1]
The fact that the errors resultant in most actuarial models are not normally distributed raises two questions. First, does non-normality in the randomness affect the accuracy of a model’s predictions? The answer is “Yes, sometimes seriously.” Second, can models be made “robust,”[2] i.e., able to deal properly with non-normal error? Again, “Yes.” Three attempts to do so are 1} to relax parts of BLUE, especially linearity and unbiasedness (robust estimation), 2} to incorporate explicit distributions into models (GLM), and 3} to bootstrap.[3] Bootstrapping a linear model begins with solving it conventionally. One obtains β̂ and the expected observation vector ŷ = Xβ̂. Gleaning information from the residual vector ê = y − ŷ, one can simulate proper, or more realistic, “pseudo-error” vectors ei* and pseudo-observations yi = ŷ + ei*. Iterating the model over the yi will produce pseudo-estimates β̂i and pseudo-predictions in keeping with the apparent distribution of error. Of the three attempts, bootstrapping is the most commonsensical.
The errors resultant in most actuarial models are not normally distributed.
Our purpose herein is to introduce a distribution for non-normal error that is suited to bootstrapping in general, but especially as regards the asymmetric and skewed data that actuaries regularly need to model. As useful candidates for non-normal error, Sections 2 and 3 will introduce the log-gamma random variable and its linear combinations. Section 4 will settle on a linear combination that arguably maximizes the ratio of versatility to complexity, the generalized logistic random variable. Section 5 will examine its special cases. Finally, Section 6 will estimate by maximum likelihood the parameters of one such distribution from actual data. The most mathematical and theoretical subjects are relegated to Appendices A–C.
2. The log-gamma random variable
If X ∼ Gamma(α, θ), then Y = ln X is a random variable whose support is the entire real line.[4] Hence, the logarithm converts a one-tailed distribution into a two-tailed. Although a leftward shift of X would move probability onto the negative real line, such a left tail would be finite. The logarithm is a natural way, even the natural way, to transform one infinite tail into two infinite tails.[5] Because the logarithm function strictly increases, the probability density function of Y ∼ Log-Gamma(α, θ) is:[6]
fY(u)=fX=eY(eu)deudu=1Γ(α)e−euθ(euθ)α−11θeu=1Γ(α)e−euθ(euθ)α
Figure 1 contains a graph of the probability density functions of both X and Y = ln X for X ∼ Gamma(1, 1) ∼ Exponential(1). The log-gamma tails are obviously infinite, and the curve itself is skewed to the left (negative skewness).
The log-gamma moments can be derived from its moment generating function:
MY(t)=E[etY]=E[etlnX]=E[Xt]=Γ(α+t)Γ(α)θt
Even better is to switch from moments to cumulants by way of the cumulant generating function:[7]
ψY(t)=lnMY(t)=lnΓ(α+t)−lnΓ(α)+tlnθψY(0)=lnMY(0)=0
The cumulants (κn) are the derivatives of this function evaluated at zero, the first four of which are:
ψ′Y(t)=ψ(α+t)+lnθκ1=E[Y]=ψ′Y(0)=ψ(α)+lnθψ′′Y(t)=ψ′(α+t)κ2=Var[Y]=ψ′′Y(0)=ψ′(α)>0ψ′′′Y(t)=ψ′′(α+t)κ3=Skew[Y]=ψ′′′Y(0)=ψ′′(α)<0ψivY(t)=ψ′′′(α+t)κ4=XsKurt[Y]=ψivY(0)=ψ′′′(α)>0
The scale factor θ affects only the mean.[8] The alternating inequalities of κ2, κ3, and κ4 derive from the polygamma formulas of Appendix A.3. The variance, of course, must be positive; the negative skewness confirms the appearance of the log-gamma density in Figure 1. The positive excess kurtosis means that the log-gamma distribution is “platykurtic;” its kurtosis is more positive than that of the normal distribution.
Since the logarithm is a concave downward function, it follows from Jensen’s inequality:
E[Y=lnX]≤lnE[X]ψ(α)+lnθ≤ln(αθ)=ln(α)+ln(θ)ψ(α)≤ln(α)
Because the probability is not amassed, the inequality is strict: ψ(α) < ln(α) for α > 0. However, when E[X] = αθ is fixed at unity and as α → ∞, the variance of X approaches zero. Hence, (ln(α) – ψ(α)) = ln 1 = 0. It is not difficult to prove that (ln(α) – ψ(α)) = ∞, as well as that ln(α) − ψ(α) strictly decreases. Therefore, for every y > 0 there exists exactly one α > 0 for which y = ln(α) – ψ(α).
The log-gamma random variable becomes an error term when its expectation equals zero. This requires the parameters to satisfy the equation E[Y] = ψ(α) + ln θ = 0, or θ = e−ψ(α). Hence, the simplest of all log-gamma error distributions is Y ∼ Log-Gamma (α, e−ψ(α)) = ln(X ∼ Gamma(α, e−ψ(α))).
3. Weighted sums of log-gamma random variables
Multiplying the log-gamma random variable by negative one reflects its distribution about the y-axis. This does not affect the even moments or cumulants; but it reverses the signs of the odd ones. For example, the skewness of –Y = −ln X = ln X−1 is positive.
Now let Y be a γ-weighted sum of independent log-gamma random variables Yk, which resolves into a product of powers of independent gamma random variables Xk ∼ Gamma(αk, θk):
Y=∑kγkYk=∑kγklnXk=∑klnXγkk=ln∏kXγkk
Although the Xk must be independent of one another, their parameters need not be identical. Because of the independence:
ψY(t)=lnE[etY]=lnE[et∑kγkY]=lnE[∏ketγkYk]=ln∏kE[etγkYk]=∑klnE[etγkYk]=∑kψYk(tγk)
The nth cumulant of the weighted sum is:
κn(Y)=dnψY(t)dtn|t=0=dndtn∑kψYk(tγk)|t=0=∑kdnψYk(tγk)dtn|t=0=∑kγnkψ[n]Yk(0)=∑kγnkκn(Yk)
So the nth cumulant of a weighted sum of independent random variables is the weighted sum of the cumulants of the random variables, the weights being raised to the nth power.[9]
Using the cumulant formulas from the previous section, we have:
κ1(Y)=∑kγkκ1(Yk)=∑kγkψ(αk)+∑kγkln(θk)=∑kγkψ(αk)+Cκ2(Y)=∑kγ2kκ2(Yk)=∑kγ2kψ′(αk)κ3(Y)=∑kγ3kκ3(Yk)=∑kγ3kψ′′(αk)κ4(Y)=∑kγ4kκ4(Yk)=∑kγ4kψ′′′(αk)
In general, κm+1 (Y) = γkm+1ψ[m](αk) + IF(m = 0, C, 0). A weighted sum of n independent log-gamma random variables would provide 2n + 1 degrees of freedom for a method-of-cumulants fitting. All the scale parameters θk would be unity. As the parameter of an error distribution, C would lose its freedom, since the mean must then equal zero. Therefore, with no loss of generality, we may write for Xk ∼ Gamma(αk, 1).
4. The generalized logistic random variable
Although any finite weighted sum is tractable, four cumulants should suffice in most practice. So let Y = γ1lnX1 + γ2lnX2 + C = + C. Even then, one gamma should be positive and the other negative; in fact, letting one be the opposite of the other will allow Y to be symmetric in special cases. Therefore, Y = γlnX1 − γlnX2 + C = γ ln(X1/X2) + C for γ > 0 should be a useful form of intermediate complexity. Let the parameterization for this purpose be X1 ∼ Gamma (α, 1) and X2 ∼ Gamma(β, 1). Contributing to the usefulness of this form is the fact that X1/X2 is a generalized Pareto random variable, whose probability density function is:[10]
fX1/X2(u)=Γ(α+β)Γ(α)Γ(β)(uu+1)α−1(1u+1)β−11(u+1)2
The not overly complicated “generalized logistic” distribution is versatile enough for modeling non-normal error.
Since eY = eC (X1/X2)γ, for –α < γt < β:
MY(t)=E[etY]=eCtE[(X1/X2)γt]=eCt∫∞u=0uγtΓ(α+β)Γ(α)Γ(β)(uu+1)α−1(1u+1)β−1du(u+1)2=eCtΓ(α+γt)Γ(β−γt)Γ(α)Γ(β)∫∞u=0Γ(α+β)Γ(α+γt)Γ(β−γt)(uu+1)α+γt−1(1u+1)β−γt−1du(u+1)2=eCtΓ(α+γt)Γ(β−γt)Γ(α)Γ(β)
Hence, the cumulant generating function and its derivatives are:
ψY(t)=lnMY(t)=Ct+lnΓ(α+γt)−lnΓ(α)+lnΓ(β−γt)−lnΓ(β)ψ′Y(t)=γ(ψ(α+γt)−ψ(β−γt))+Cψ′′Y(t)=γ2(ψ′(α+γt)+ψ′(β−γt))
ψ′′′Y(t)=γ3(ψ′′(α+γt)−ψ′′(β−γt))ψivY(t)=γ4(ψ′′′(α+γt)+ψ′′′(β−γt))
And so, the cumulants are:
κ1=E[Y]=ψ′Y(0)=C+γ(ψ(α)−γ(β))κ2=Var[Y]=ψ′′Y(0)=γ2(ψ′(α)+ψ′(β))>0κ3=Skew[Y]=ψ′′′Y(0)=γ3(ψ′′(α)−ψ′′(β))κ4=XsKurt[Y]=ψivY(0)=γ4(ψ′′′(α)+ψ′′′(β))>0
The three parameters α, β, γ could be fitted to empirical cumulants κ2, κ3, and κ4. For an error distribution C would equal γ(ψ(β) – ψ(α)). Since κ4 > 0, the random variable Y is platykurtic.
Since Y = γ ln(X1/X2) + C, Z = (Y − C)/γ = ln(X1/X2) may be considered a reduced form. From the generalized Pareto density above, we can derive the density of Z:
fZ(u)=fX1/X2=eZ(eu)deudu=Γ(α+β)Γ(α)Γ(β)(eueu+1)α−1(1eu+1)β−11(eu+1)2⋅eu=Γ(α+β)Γ(α)Γ(β)(eueu+1)α(1eu+1)β
Z ∼ ln(Gamma(α, 1)/Gamma(β, 1)) is a “generalized logistic” random variable (Wikipedia [Generalized logistic distribution]).
The probability density functions of generalized logistic random variables are skew-symmetric:
f1Z=lnX2X1(−u)=Γ(β+α)Γ(β)Γ(α)(e−ue−u+1)β(1e−u+1)α=Γ(α+β)Γ(α)Γ(β)(eueu+1)α(1eu+1)β=fZ=lnX1X2(u)
In Figure 2 are graphed three generalized logistic probability density functions.
The density is symmetric if and only if α = β; the gray curve is that of the logistic density, for which α = β = 1. The mode of the generalized-logistic (α, β) density is umode = ln α/β. Therefore, the mode is positive [or negative] if and only if α > β [or α < β]. Since the digamma and tetragamma functions ψ, ψ″ strictly increase over the positive reals, the signs of E[Z] = ψ(α) – ψ(β) and Skew[Z] = ψ″ (α) – ψ″(β) are the same as the sign of α − β. The positive mode of the orange curve (2, 1) implies positive mean and skewness, whereas for the blue curve (1, 4) they are negative.
5. Special cases
Although the probability density function of the generalized logistic random variable is of closed form, the form of its cumulative distribution function is not closed, save for the special cases of α = 1 and β = 1. The special case of α = 1 reduces X1/X2 to an ordinary Pareto. In this case, the cumulative distribution is . Likewise, the special case of β = 1 reduces X1/X2 to an inverse Pareto. In that case, . It would be easy to simulate values of Z in both cases by the inversion method (Klugman, Panjer, and Wilmot [1998, Appendix H.2]).[11]
Quite interesting is the special case α = β. For Z is symmetric about its mean if and only if α = β, in which case all the odd cumulants higher than the mean equal zero. Therefore, in this case:
fZ(u)=Γ(2α)Γ(α)2(eu)α(eu+1)2α
Symmetry is confirmed in as much as fZ(−u) = fZ(u). When α = β = 1, whose cumulative distribution is In this case, Z is a logistic distribution. Its mean and skewness are zero. As for its even cumulants:[12]
κ2=Var[Z]=ψ′(α)+ψ′(β)=2ψ′(1)=2∞∑k=11k2=2⋅ζ(2)=2π26=π23≈3.290κ4=XsKurt[Z]=ψ′′′(α)+ψ′′′(β)=2ψ′′′(1)=2⋅6∞∑k=11k4=12⋅ζ(4)=12⋅π490=2π415≈12.988
Instructive also is the special case α = β = ½. Since Γ(½) = , the probability density function in this case is The constant suggests a connection with the Cauchy density (Wikepedia [Cauchy distribution]). Indeed, the density function of the random variable W = eZ/2 is:
fW(u)=fZ=2lnW(2lnu)d(2lnu)du=1πuu2+12u=2π1u2+1
This is the density function of the absolute value of the standard Cauchy random variable.[13]
6. A maximum-likelihood example
Table 1 shows 85 standardized[14] error terms from an additive-incurred model. A triangle of incremental losses by accident-year row and evaluation-year column was modeled from the exposures of its 13 accident years. Variance by column was assumed to be proportional to exposure. A complete triangle would have observations, but we happened to exclude six.
The sample mean is nearly zero at 0.001. Three of the five columns are negative, and the positive errors are more dispersed than the negative. Therefore, this sample is positively skewed, or skewed to the right. Other sample cumulants are 0.850 (variance), 1.029 (skewness), and 1.444 (excess kurtosis). The coefficients of skewness and of kurtosis are 1.314 and 1.999.
By maximum likelihood we wished to explain the sample as coming from Y = γZ + C, where Z = ln(X1/X2) ∼ ln(Gamma(α, 1)/Gamma(β, 1)), the generalized logistic variable of Section 4 with distribution:
fZ(u)=Γ(α+β)Γ(α)Γ(β)(eueu+1)α(1eu+1)β
So, defining z(u; C, γ) = (u − C)/γ, whereby z(Y) = Z, we have the distribution of Y:
fY(u)=fZ=z(Y)(z(u))z′(u)=Γ(α+β)Γ(α)Γ(β)(ez(u)ez(u)+1)α(1ez(u)+1)β1γ
The logarithm, or the log-likelihood, is:
lnfY(u)=lnΓ(α+β)−lnΓ(α)−lnΓ(β)+αz(u)−(α+β)ln(ez(u)+1)−lnγ
This is a function in four parameters; C and γ are implicit in z(u). With all four parameters free, the likelihood of the sample could be maximized. Yet it is both reasonable and economical to estimate Y as a “standard-error” distribution, i.e., as having zero mean and unit variance. In Excel it sufficed us to let the Solver add-in maximize the log-likelihood with respect to α and β, giving due consideration that C and γ, constrained by zero mean and unit variance, are themselves functions of α and β. As derived in Section 4, E[Y] = C + γ(ψ(α) – ψ(β)) and Var[Y] = γ2(ψ′(α) + ψ′(β)). Hence, standardization requires that and that C(α, β) = γ(α, β) · (ψ(β) – ψ(α)). The log-likelihood maximized at α̂ = 0.326 and β̂ = 0.135. Numerical derivatives of the GAMMALN function approximated the digamma and trigamma functions at these values. The remaining parameters for a standardized distribution must be Ĉ = −0.561 and γ̂ = 0.122. Figure 3 graphs the maximum-likelihood result.
The maximum absolute difference between the empirical and the fitted, |max| = 0.066, occurs at x = 0.1. We tested its significance with a “KS-like” (Kolmogorov-Smirnov, cf. Hogg and Klugman [1984, p. 104]) statistic. We simulated 1,000 samples of 85 instances from the fitted distribution, i.e., from the Y distribution with its four estimated parameters. Each simulation provided an empirical cumulative distribution, whose maximum absolute deviation we derived from the fitted distribution over the interval [−4, 4] in steps of 0.1. Figure 4 contains the graph of the cumulative density function of |max|.
The actual deviation of 0.066 coincides with the 31st percentile, or about with the lower tercile. The dashed line in Figure 3 (legend “NormS”) represents the cumulative standard-normal distribution. The empirical distribution has more probability below zero and a heavier right tail. Simulations with these logistic error terms are bound to be more accurate than simulations defaulted to normal errors.[15]
7. Conclusion
Actuaries are well schooled in loss distributions, which are non-negative, positively skewed, and right-tailed. The key to a versatile distribution of error is to combine logarithms of loss distributions. Because most loss distributions are transformations of the gamma distribution, the log-gamma distribution covers most of the possible combinations. The generalized logistic distribution strikes a balance between versatility and complexity. It should be a recourse to the actuary seeking to bootstrap a model whose residuals are not normally distributed.



