The Log-Gamma Distribution and Non-Normal Error

Leigh J. Halliwell

Variance

Halliwell, Leigh J. 2021. “The Log-Gamma Distribution and Non-Normal Error.” Variance 13 (2): 173–89.

Download all (4)

Figure 1. Probability densities
Download
Figure 2. Generalized logistic densities
Download
Figure 3. Empirical and fitted CDFs
Download
Figure 4. Distributon of |max|
Download

View more stats

Abstract

Because insured losses are positive, loss distributions start from zero and are right-tailed. However, residuals, or errors, are centered about a mean of zero and have both right and left tails. Seldom do error terms from models of insured losses seem normal. Usually they are positively skewed, rather than symmetric. And their right tails, as measured by their asymptotic failure rates, are heavier than that of the normal. As an error distribution suited to actuarial modeling this paper presents and recommends the log-gamma distribution and its linear combinations, especially the combination known as the generalized logistic distribution. To serve as an example, a generalized logistic distribution is fitted by maximum likelihood to the standardized residuals of a loss-triangle model. Much theory is required for, and occasioned by, this presentation, most of which appears in three appendices along with some related mathematical history.

1. Introduction

Since the late 1980s actuarial science has been moving gradually from methods to models. The movement was made possible by personal computers; it was made necessary by insurance competition. Actuaries, who used to be known for fitting curves and extrapolating from data, are now likely to be fitting models and explaining data.

Statistical modeling seeks to explain a block of observed data as a function of known variables. The model makes it possible to predict what will be observed from new instances of those variables. However, rarely, if ever, does the function perfectly explain the data. A successful model, as well as seeming reasonable, should explain the observations tolerably well. So the deterministic model y = f(X) gives way to the approximation y ≈ f(X), which is restored to equality with the addition of a random error term: y = f(X) + e. The simplest model is the homoskedastic linear statistical model (LSM) with vector form y = Xβ + e, in which E[e] = 0 and Var[e] = σ²I. According to LSM theory, β̂ = (X′X)^–1X′y is the best linear unbiased estimator (BLUE) of β, even if the elements of error vector e are not normally distributed.^[1]

The fact that the errors resultant in most actuarial models are not normally distributed raises two questions. First, does non-normality in the randomness affect the accuracy of a model’s predictions? The answer is “Yes, sometimes seriously.” Second, can models be made “robust,”^[2] i.e., able to deal properly with non-normal error? Again, “Yes.” Three attempts to do so are 1} to relax parts of BLUE, especially linearity and unbiasedness (robust estimation), 2} to incorporate explicit distributions into models (GLM), and 3} to bootstrap.^[3] Bootstrapping a linear model begins with solving it conventionally. One obtains β̂ and the expected observation vector ŷ = Xβ̂. Gleaning information from the residual vector ê = y − ŷ, one can simulate proper, or more realistic, “pseudo-error” vectors e_i^* and pseudo-observations y_i = ŷ + e_i^*. Iterating the model over the y_i will produce pseudo-estimates β̂_i and pseudo-predictions in keeping with the apparent distribution of error. Of the three attempts, bootstrapping is the most commonsensical.

The errors resultant in most actuarial models are not normally distributed.

Our purpose herein is to introduce a distribution for non-normal error that is suited to bootstrapping in general, but especially as regards the asymmetric and skewed data that actuaries regularly need to model. As useful candidates for non-normal error, Sections 2 and 3 will introduce the log-gamma random variable and its linear combinations. Section 4 will settle on a linear combination that arguably maximizes the ratio of versatility to complexity, the generalized logistic random variable. Section 5 will examine its special cases. Finally, Section 6 will estimate by maximum likelihood the parameters of one such distribution from actual data. The most mathematical and theoretical subjects are relegated to Appendices A–C.

2. The log-gamma random variable

If X ∼ Gamma(α, θ), then Y = ln X is a random variable whose support is the entire real line.^[4] Hence, the logarithm converts a one-tailed distribution into a two-tailed. Although a leftward shift of X would move probability onto the negative real line, such a left tail would be finite. The logarithm is a natural way, even the natural way, to transform one infinite tail into two infinite tails.^[5] Because the logarithm function strictly increases, the probability density function of Y ∼ Log-Gamma(α, θ) is:^[6]

$\begin{aligned} f_{Y}(u) & =f_{X=e^{Y}}\left(e^{u}\right) \frac{d e^{u}}{d u}=\frac{1}{\Gamma(\alpha)} e^{-\frac{e^{u}}{\theta}}\left(\frac{e^{u}}{\theta}\right)^{\alpha-1} \frac{1}{\theta} e^{u} \\ & =\frac{1}{\Gamma(\alpha)} e^{-\frac{e^{u}}{\theta}}\left(\frac{e^{u}}{\theta}\right)^{\alpha} \end{aligned}$

Figure 1 contains a graph of the probability density functions of both X and Y = ln X for X ∼ Gamma(1, 1) ∼ Exponential(1). The log-gamma tails are obviously infinite, and the curve itself is skewed to the left (negative skewness).

The log-gamma moments can be derived from its moment generating function:

$M_{Y}(t)=E\left[e^{t Y}\right]=E\left[e^{t \ln X}\right]=E\left[X^{t}\right]=\frac{\Gamma(\alpha+t)}{\Gamma(\alpha)} \theta^{t}$

Even better is to switch from moments to cumulants by way of the cumulant generating function:^[7]

$\begin{array}{l} \psi_{Y}(t)=\ln M_{Y}(t)=\ln \Gamma(\alpha+t)-\ln \Gamma(\alpha)+t \ln \theta \\ \psi_{Y}(0)=\ln M_{Y}(0)=0 \end{array}$

The cumulants (κ_n) are the derivatives of this function evaluated at zero, the first four of which are:

$\begin{array}{c} \psi_{Y}^{\prime}(t)=\psi(\alpha+t)+\ln \theta \\ \kappa_{1}=E[Y]=\psi_{Y}^{\prime}(0)=\psi(\alpha)+\ln \theta \\ \psi_{Y}^{\prime \prime}(t)=\psi^{\prime}(\alpha+t) \\ \kappa_{2}=\operatorname{Var}[Y]=\psi_{Y}^{\prime \prime}(0)=\psi^{\prime}(\alpha)>0 \\ \psi_{Y}^{\prime \prime \prime}(t)=\psi^{\prime \prime}(\alpha+t) \\ \kappa_{3}=\operatorname{Skew}[Y]=\psi_{Y}^{\prime \prime \prime}(0)=\psi^{\prime \prime}(\alpha)<0 \\ \psi_{Y}^{\mathrm{iv}}(t)=\psi^{\prime \prime \prime}(\alpha+t) \\ \kappa_{4}=\operatorname{XsKurt}[Y]=\psi_{Y}^{\mathrm{iv}}(0)=\psi^{\prime \prime \prime}(\alpha)>0 \end{array}$

The scale factor θ affects only the mean.^[8] The alternating inequalities of κ₂, κ₃, and κ₄ derive from the polygamma formulas of Appendix A.3. The variance, of course, must be positive; the negative skewness confirms the appearance of the log-gamma density in Figure 1. The positive excess kurtosis means that the log-gamma distribution is “platykurtic;” its kurtosis is more positive than that of the normal distribution.

Figure 1.Probability densities

Since the logarithm is a concave downward function, it follows from Jensen’s inequality:

$\begin{array}{c} E[Y=\ln X] \leq \ln E[X] \\ \psi(\alpha)+\ln \theta \leq \ln (\alpha \theta)=\ln (\alpha)+\ln (\theta) \\ \psi(\alpha) \leq \ln (\alpha) \end{array}$

Because the probability is not amassed, the inequality is strict: ψ(α) < ln(α) for α > 0. However, when E[X] = αθ is fixed at unity and as α → ∞, the variance of X approaches zero. Hence, $\lim _{\alpha \rightarrow \infty}$ (ln(α) – ψ(α)) = ln 1 = 0. It is not difficult to prove that $\lim _{\alpha \rightarrow 0^{+}}$ (ln(α) – ψ(α)) = ∞, as well as that ln(α) − ψ(α) strictly decreases. Therefore, for every y > 0 there exists exactly one α > 0 for which y = ln(α) – ψ(α).

The log-gamma random variable becomes an error term when its expectation equals zero. This requires the parameters to satisfy the equation E[Y] = ψ(α) + ln θ = 0, or θ = e^−ψ(α). Hence, the simplest of all log-gamma error distributions is Y ∼ Log-Gamma (α, e^−ψ(α)) = ln(X ∼ Gamma(α, e^−ψ(α))).

3. Weighted sums of log-gamma random variables

Multiplying the log-gamma random variable by negative one reflects its distribution about the y-axis. This does not affect the even moments or cumulants; but it reverses the signs of the odd ones. For example, the skewness of –Y = −ln X = ln X⁻¹ is positive.

Now let Y be a γ-weighted sum of independent log-gamma random variables Y_k, which resolves into a product of powers of independent gamma random variables X_k ∼ Gamma(α_k, θ_k):

$Y=\sum_{k} \gamma_{k} Y_{k}=\sum_{k} \gamma_{k} \ln X_{k}=\sum_{k} \ln X_{k}^{\gamma_{k}}=\ln \prod_{k} X_{k}^{\gamma_{k}}$

Although the X_k must be independent of one another, their parameters need not be identical. Because of the independence:

$\begin{aligned} \psi_{Y}(t) & =\ln E\left[e^{t Y}\right]=\ln E\left[e^{t \sum_{k} \gamma_{k} Y}\right]=\ln E\left[\prod_{k} e^{t \gamma_{k} Y_{k}}\right] \\ & =\ln \prod_{k} E\left[e^{t \gamma_{k} Y_{k}}\right]=\sum_{k} \ln E\left[e^{t \gamma_{k} Y_{k}}\right]=\sum_{k} \psi_{Y_{k}}\left(t \gamma_{k}\right) \end{aligned}$

The nth cumulant of the weighted sum is:

$\begin{aligned} \kappa_{n}(Y) & =\left.\frac{d^{n} \psi_{Y}(t)}{d t^{n}}\right|_{t=0}=\left.\frac{d^{n}}{d t^{n}} \sum_{k} \psi_{Y_{k}}\left(t \gamma_{k}\right)\right|_{t=0} \\ & =\left.\sum_{k} \frac{d^{n} \psi_{Y_{k}}\left(t \gamma_{k}\right)}{d t^{n}}\right|_{t=0}=\sum_{k} \gamma_{k}^{n} \psi_{Y_{k}}^{[n]}(0) \\ & =\sum_{k} \gamma_{k}^{n} \kappa_{n}\left(Y_{k}\right) \end{aligned}$

So the nth cumulant of a weighted sum of independent random variables is the weighted sum of the cumulants of the random variables, the weights being raised to the nth power.^[9]

Using the cumulant formulas from the previous section, we have:

$\begin{aligned} \kappa_{1}(Y) & =\sum_{k} \gamma_{k} \kappa_{1}\left(Y_{k}\right)=\sum_{k} \gamma_{k} \psi\left(\alpha_{k}\right)+\sum_{k} \gamma_{k} \ln \left(\theta_{k}\right) \\ & =\sum_{k} \gamma_{k} \psi\left(\alpha_{k}\right)+C \\ \kappa_{2}(Y) & =\sum_{k} \gamma_{k}^{2} \kappa_{2}\left(Y_{k}\right)=\sum_{k} \gamma_{k}^{2} \psi^{\prime}\left(\alpha_{k}\right) \\ \kappa_{3}(Y) & =\sum_{k} \gamma_{k}^{3} \kappa_{3}\left(Y_{k}\right)=\sum_{k} \gamma_{k}^{3} \psi^{\prime \prime}\left(\alpha_{k}\right) \\ \kappa_{4}(Y) & =\sum_{k} \gamma_{k}^{4} \kappa_{4}\left(Y_{k}\right)=\sum_{k} \gamma_{k}^{4} \psi^{\prime \prime \prime}\left(\alpha_{k}\right) \end{aligned}$

In general, κ_m₊₁ (Y) = $\sum_{k}$ γ_k^m⁺¹ψ^[m](α_k) + IF(m = 0, C, 0). A weighted sum of n independent log-gamma random variables would provide 2n + 1 degrees of freedom for a method-of-cumulants fitting. All the scale parameters θ_k would be unity. As the parameter of an error distribution, C would lose its freedom, since the mean must then equal zero. Therefore, with no loss of generality, we may write $Y=\ln \prod_k X_k^{\gamma_k}+C$ for X_k ∼ Gamma(α_k, 1).

4. The generalized logistic random variable

Although any finite weighted sum is tractable, four cumulants should suffice in most practice. So let Y = γ₁lnX₁ + γ₂lnX₂ + C = $\ln \left(X_1^{\gamma_1} X_2^{\gamma_2}\right)$ + C. Even then, one gamma should be positive and the other negative; in fact, letting one be the opposite of the other will allow Y to be symmetric in special cases. Therefore, Y = γlnX₁ − γlnX₂ + C = γ ln(X₁/X₂) + C for γ > 0 should be a useful form of intermediate complexity. Let the parameterization for this purpose be X₁ ∼ Gamma (α, 1) and X₂ ∼ Gamma(β, 1). Contributing to the usefulness of this form is the fact that X₁/X₂ is a generalized Pareto random variable, whose probability density function is:^[10]

$f_{X_{1} / X_{2}}(u)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)}\left(\frac{u}{u+1}\right)^{\alpha-1}\left(\frac{1}{u+1}\right)^{\beta-1} \frac{1}{(u+1)^{2}}$

The not overly complicated “generalized logistic” distribution is versatile enough for modeling non-normal error.

Since e^Y = e^C (X₁/X₂)^γ, for –α < γt < β:

$\begin{aligned} M_{Y}(t)= & E\left[e^{t Y}\right]=e^{C t} E\left[\left(X_{1} / X_{2}\right)^{\gamma t}\right] \\ = & e^{C t} \int_{u=0}^{\infty} u^{\gamma t} \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)}\left(\frac{u}{u+1}\right)^{\alpha-1} \\ & \left(\frac{1}{u+1}\right)^{\beta-1} \frac{d u}{(u+1)^{2}} \\ = & e^{C t} \frac{\Gamma(\alpha+\gamma t) \Gamma(\beta-\gamma t)}{\Gamma(\alpha) \Gamma(\beta)} \int_{u=0}^{\infty} \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha+\gamma t) \Gamma(\beta-\gamma t)} \\ & \left(\frac{u}{u+1}\right)^{\alpha+\gamma t-1}\left(\frac{1}{u+1}\right)^{\beta-\gamma t-1} \frac{d u}{(u+1)^{2}} \\ = & e^{C t} \frac{\Gamma(\alpha+\gamma t) \Gamma(\beta-\gamma t)}{\Gamma(\alpha) \Gamma(\beta)} \end{aligned}$

Hence, the cumulant generating function and its derivatives are:

$\begin{aligned} \psi_{Y}(t)= & \ln M_{Y}(t)=C t+\ln \Gamma(\alpha+\gamma t) \\ & -\ln \Gamma(\alpha)+\ln \Gamma(\beta-\gamma t)-\ln \Gamma(\beta) \\ \psi_{Y}^{\prime}(t)= & \gamma(\psi(\alpha+\gamma t)-\psi(\beta-\gamma t))+C \\ \psi_{Y}^{\prime \prime}(t)= & \gamma^{2}\left(\psi^{\prime}(\alpha+\gamma t)+\psi^{\prime}(\beta-\gamma t)\right) \end{aligned}$

$\begin{array}{l} \psi_{Y}^{\prime \prime \prime}(t)=\gamma^{3}\left(\psi^{\prime \prime}(\alpha+\gamma t)-\psi^{\prime \prime}(\beta-\gamma t)\right) \\ \psi_{Y}^{\mathrm{iv}}(t)=\gamma^{4}\left(\psi^{\prime \prime \prime}(\alpha+\gamma t)+\psi^{\prime \prime \prime}(\beta-\gamma t)\right) \end{array}$

And so, the cumulants are:

$\begin{array}{l} \kappa_{1}=E[Y]=\psi_{Y}^{\prime}(0)=C+\gamma(\psi(\alpha)-\gamma(\beta)) \\ \kappa_{2}=\operatorname{Var}[Y]=\psi_{Y}^{\prime \prime}(0)=\gamma^{2}\left(\psi^{\prime}(\alpha)+\psi^{\prime}(\beta)\right)>0 \\ \kappa_{3}=\operatorname{Skew}[Y]=\psi_{Y}^{\prime \prime \prime}(0)=\gamma^{3}\left(\psi^{\prime \prime}(\alpha)-\psi^{\prime \prime}(\beta)\right) \\ \kappa_{4}=\operatorname{XsKurt}[Y]=\psi_{Y}^{\mathrm{iv}}(0)=\gamma^{4}\left(\psi^{\prime \prime \prime}(\alpha)+\psi^{\prime \prime \prime}(\beta)\right)>0 \end{array}$

The three parameters α, β, γ could be fitted to empirical cumulants κ₂, κ₃, and κ₄. For an error distribution C would equal γ(ψ(β) – ψ(α)). Since κ₄ > 0, the random variable Y is platykurtic.

Since Y = γ ln(X₁/X₂) + C, Z = (Y − C)/γ = ln(X₁/X₂) may be considered a reduced form. From the generalized Pareto density above, we can derive the density of Z:

$\begin{aligned} f_{Z}(u) & =f_{X_{1} / X_{2}=e^{Z}}\left(e^{u}\right) \frac{d e^{u}}{d u} \\ & =\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)}\left(\frac{e^{u}}{e^{u}+1}\right)^{\alpha-1}\left(\frac{1}{e^{u}+1}\right)^{\beta-1} \frac{1}{\left(e^{u}+1\right)^{2}} \cdot e^{u} \\ & =\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)}\left(\frac{e^{u}}{e^{u}+1}\right)^{\alpha}\left(\frac{1}{e^{u}+1}\right)^{\beta} \end{aligned}$

Z ∼ ln(Gamma(α, 1)/Gamma(β, 1)) is a “generalized logistic” random variable (Wikipedia [Generalized logistic distribution]).

The probability density functions of generalized logistic random variables are skew-symmetric:

$\begin{aligned} f_{\frac{1}{Z}=\ln \frac{X_{2}}{X_{1}}}(-u) & =\frac{\Gamma(\beta+\alpha)}{\Gamma(\beta) \Gamma(\alpha)}\left(\frac{e^{-u}}{e^{-u}+1}\right)^{\beta}\left(\frac{1}{e^{-u}+1}\right)^{\alpha} \\ & =\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)}\left(\frac{e^{u}}{e^{u}+1}\right)^{\alpha}\left(\frac{1}{e^{u}+1}\right)^{\beta} \\ & =f_{Z=\ln \frac{X_{1}}{X_{2}}}(u) \end{aligned}$

In Figure 2 are graphed three generalized logistic probability density functions.

Figure 2.Generalized logistic densities

The density is symmetric if and only if α = β; the gray curve is that of the logistic density, for which α = β = 1. The mode of the generalized-logistic (α, β) density is u_mode = ln α/β. Therefore, the mode is positive [or negative] if and only if α > β [or α < β]. Since the digamma and tetragamma functions ψ, ψ″ strictly increase over the positive reals, the signs of E[Z] = ψ(α) – ψ(β) and Skew[Z] = ψ″ (α) – ψ″(β) are the same as the sign of α − β. The positive mode of the orange curve (2, 1) implies positive mean and skewness, whereas for the blue curve (1, 4) they are negative.

5. Special cases

Although the probability density function of the generalized logistic random variable is of closed form, the form of its cumulative distribution function is not closed, save for the special cases of α = 1 and β = 1. The special case of α = 1 reduces X₁/X₂ to an ordinary Pareto. In this case, the cumulative distribution is $F_{Z}(u)=1-\left(\frac{1}{e^{u}+1}\right)^{\beta}$ . Likewise, the special case of β = 1 reduces X₁/X₂ to an inverse Pareto. In that case, $F_{Z}(u)=\left(\frac{e^{u}}{e^{u}+1}\right)^{\alpha}$ . It would be easy to simulate values of Z in both cases by the inversion method (Klugman, Panjer, and Wilmot [1998, Appendix H.2]).^[11]

Quite interesting is the special case α = β. For Z is symmetric about its mean if and only if α = β, in which case all the odd cumulants higher than the mean equal zero. Therefore, in this case:

$f_{Z}(u)=\frac{\Gamma(2 \alpha)}{\Gamma(\alpha)^{2}} \frac{\left(e^{u}\right)^{\alpha}}{\left(e^{u}+1\right)^{2 \alpha}}$

Symmetry is confirmed in as much as f_Z(−u) = f_Z(u). When α = β = 1, $f_{Z}(u)=\frac{e^{u}}{\left(e^{u}+1\right)^{2}}$ , whose cumulative distribution is $F_{Z}(u)=\frac{e^{u}}{e^{u}+1}=\left(\frac{e^{u}}{e^{u}+1}\right)^{\alpha=1}=$ $1-\left(\frac{1}{e^{u}+1}\right)^{\beta=1}$ . In this case, Z is a logistic distribution. Its mean and skewness are zero. As for its even cumulants:^[12]

$\begin{aligned} \kappa_{2} & =\operatorname{Var}[Z]=\psi^{\prime}(\alpha)+\psi^{\prime}(\beta)=2 \psi^{\prime}(1) \\ & =2 \sum_{k=1}^{\infty} \frac{1}{k^{2}}=2 \cdot \zeta(2)=2 \frac{\pi^{2}}{6}=\frac{\pi^{2}}{3} \approx 3.290 \\ \kappa_{4} & =\operatorname{XsKurt}[Z]=\psi^{\prime \prime \prime}(\alpha)+\psi^{\prime \prime \prime}(\beta)=2 \psi^{\prime \prime \prime}(1) \\ & =2 \cdot 6 \sum_{k=1}^{\infty} \frac{1}{k^{4}}=12 \cdot \zeta(4)=12 \cdot \frac{\pi^{4}}{90}=\frac{2 \pi^{4}}{15} \approx 12.988 \end{aligned}$

Instructive also is the special case α = β = ½. Since Γ(½) = $\sqrt{\pi}$ , the probability density function in this case is $f_{Z}(x)=\frac{1}{\pi} \frac{e^{x / 2}}{e^{x}+1}$ . The constant $\frac{1}{\pi}$ suggests a connection with the Cauchy density $\frac{1}{\pi} \frac{1}{u^{2}+1}$ (Wikepedia [Cauchy distribution]). Indeed, the density function of the random variable W = e^Z^/2 is:

$\begin{aligned} f_{W}(u) & =f_{Z=2 \ln W}(2 \ln u) \frac{d(2 \ln u)}{d u} \\ & =\frac{1}{\pi} \frac{u}{u^{2}+1} \frac{2}{u}=\frac{2}{\pi} \frac{1}{u^{2}+1} \end{aligned}$

This is the density function of the absolute value of the standard Cauchy random variable.^[13]

6. A maximum-likelihood example

Table 1 shows 85 standardized^[14] error terms from an additive-incurred model. A triangle of incremental losses by accident-year row and evaluation-year column was modeled from the exposures of its 13 accident years. Variance by column was assumed to be proportional to exposure. A complete triangle would have $\sum_{k=1}^{13} k=91$ observations, but we happened to exclude six.

Table 1.Ordered error sample (85 = 17

$\times$ 5)

−2.287	−0.585	−0.326	−0.009	0.658
−1.575	−0.582	−0.317	0.009	0.663
−1.428	−0.581	−0.311	0.010	0.698
−1.034	−0.545	−0.298	0.012	0.707
−1.022	−0.544	−0.267	0.017	0.821
−1.011	−0.514	−0.260	0.019	0.998
−0.913	−0.503	−0.214	0.038	1.009
−0.879	−0.501	−0.202	0.090	1.061
−0.875	−0.487	−0.172	0.095	1.227
−0.856	−0.486	−0.167	0.123	1.559
−0.794	−0.453	−0.165	0.222	1.966
−0.771	−0.435	−0.162	0.255	1.973
−0.726	−0.429	−0.115	0.362	2.119
−0.708	−0.417	−0.053	0.367	2.390
−0.670	−0.410	−0.050	0.417	2.414
−0.622	−0.384	−0.042	0.417	2.422
−0.612	−0.381	−0.038	0.488	2.618

The sample mean is nearly zero at 0.001. Three of the five columns are negative, and the positive errors are more dispersed than the negative. Therefore, this sample is positively skewed, or skewed to the right. Other sample cumulants are 0.850 (variance), 1.029 (skewness), and 1.444 (excess kurtosis). The coefficients of skewness and of kurtosis are 1.314 and 1.999.

By maximum likelihood we wished to explain the sample as coming from Y = γZ + C, where Z = ln(X₁/X₂) ∼ ln(Gamma(α, 1)/Gamma(β, 1)), the generalized logistic variable of Section 4 with distribution:

$f_{Z}(u)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)}\left(\frac{e^{u}}{e^{u}+1}\right)^{\alpha}\left(\frac{1}{e^{u}+1}\right)^{\beta}$

So, defining z(u; C, γ) = (u − C)/γ, whereby z(Y) = Z, we have the distribution of Y:

$\begin{aligned} f_{Y}(u) & =f_{Z=z(Y)}(z(u)) z^{\prime}(u) \\ & =\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)}\left(\frac{e^{z(u)}}{e^{z(u)}+1}\right)^{\alpha}\left(\frac{1}{e^{z(u)}+1}\right)^{\beta} \frac{1}{\gamma} \end{aligned}$

The logarithm, or the log-likelihood, is:

$\begin{aligned} \ln f_{Y}(u)= & \ln \Gamma(\alpha+\beta)-\ln \Gamma(\alpha)-\ln \Gamma(\beta) \\ & +\alpha z(u)-(\alpha+\beta) \ln \left(e^{z(u)}+1\right)-\ln \gamma \end{aligned}$

This is a function in four parameters; C and γ are implicit in z(u). With all four parameters free, the likelihood of the sample could be maximized. Yet it is both reasonable and economical to estimate Y as a “standard-error” distribution, i.e., as having zero mean and unit variance. In Excel it sufficed us to let the Solver add-in maximize the log-likelihood with respect to α and β, giving due consideration that C and γ, constrained by zero mean and unit variance, are themselves functions of α and β. As derived in Section 4, E[Y] = C + γ(ψ(α) – ψ(β)) and Var[Y] = γ²(ψ′(α) + ψ′(β)). Hence, standardization requires that $\gamma(\alpha, \beta)=1 / \sqrt{\psi^{\prime}(\alpha)+\psi^{\prime}(\beta)}$ and that C(α, β) = γ(α, β) · (ψ(β) – ψ(α)). The log-likelihood maximized at α̂ = 0.326 and β̂ = 0.135. Numerical derivatives of the GAMMALN function approximated the digamma and trigamma functions at these values. The remaining parameters for a standardized distribution must be Ĉ = −0.561 and γ̂ = 0.122. Figure 3 graphs the maximum-likelihood result.

Figure 3.Empirical and fitted CDFs

The maximum absolute difference between the empirical and the fitted, |max| = 0.066, occurs at x = 0.1. We tested its significance with a “KS-like” (Kolmogorov-Smirnov, cf. Hogg and Klugman [1984, p. 104]) statistic. We simulated 1,000 samples of 85 instances from the fitted distribution, i.e., from the Y distribution with its four estimated parameters. Each simulation provided an empirical cumulative distribution, whose maximum absolute deviation we derived from the fitted distribution over the interval [−4, 4] in steps of 0.1. Figure 4 contains the graph of the cumulative density function of |max|.

Figure 4.Distributon of |max|

The actual deviation of 0.066 coincides with the 31st percentile, or about with the lower tercile. The dashed line in Figure 3 (legend “NormS”) represents the cumulative standard-normal distribution. The empirical distribution has more probability below zero and a heavier right tail. Simulations with these logistic error terms are bound to be more accurate than simulations defaulted to normal errors.^[15]

7. Conclusion

Actuaries are well schooled in loss distributions, which are non-negative, positively skewed, and right-tailed. The key to a versatile distribution of error is to combine logarithms of loss distributions. Because most loss distributions are transformations of the gamma distribution, the log-gamma distribution covers most of the possible combinations. The generalized logistic distribution strikes a balance between versatility and complexity. It should be a recourse to the actuary seeking to bootstrap a model whose residuals are not normally distributed.

References

Artin, E. (1931) 2015. The Gamma Function. Translated by M. Baker. New York: Dover.

Google Scholar

Halliwell, L. J. 1996. “Loss Prediction by Generalized Least Squares.” Proceedings of the Casualty Actuarial Society 83:156–93. http://www.casact.org/pubs/proceed/proceed96/96436.pdf.

Google Scholar

———. 2011. “Conditional Probability and the Collective Risk Model.” Casualty Actuarial Society E-Forum, Spring. http://www.casact.org/pubs/forum/11spforum/Halliwell.pdf.

Google Scholar

———. 2015. “The Gauss-Markov Theorem: Beyond the BLUE.” Casualty Actuarial Society E-Forum, Autumn. http://www.casact.org/pubs/forum/15fforum/Halliwell_GM.pdf.

Google Scholar

Havil, J. 2003. Gamma: Exploring Euler’s Constant. Princeton University Press.

Google Scholar

Hogg, R. V., and S. A. Klugman. 1984. Loss Distributions. New York: Wiley. https://doi.org/10.1002/9780470316634.

Google Scholar

Judge, G. G., R. C. Hill, W. E. Griffiths, H. Lütkepohl, and T. Lee. 1988. Introduction to the Theory and Practice of Econometrics. 2nd ed. New York: Wiley.

Google Scholar

Klugman, S. A., H. H. Panjer, and G. E. Wilmot. 1998. Loss Models: From Data to Decisions. New York: Wiley.

Google Scholar

Leemis, L. n.d. “Log-Gamma Distribution.” http://www.math.wm.edu/~leemis/chart/UDR/PDFs/Loggamma.pdf.

Meyers, G. 2013. “The Skew Normal Distribution and Beyond.” Actuarial Review, May, 15. http://www.casact.org/newsletter/pdfUpload/ar/AR_May2013_1.pdf.

Google Scholar

Shapland, M. R. 2016. Using the ODP Bootstrap Model: A Practitioner’s Guide. CAS Monograph Series Number 4. Arlington, VA: Casualty Actuarial Society. http://www.casact.org/pubs/monographs/papers/04-shapland.pdf.

Google Scholar

Venter, G. G. 1983. “Transformed Beta and Gamma Distributions and Aggregate Losses.” Proceedings of the Casualty Actuarial Society 70:156–93. http://www.casact.org/pubs/proceed/proceed83/83156.pdf.

Google Scholar

Wikipedia contributors. n.d.-a. “Bohr–Mollerup Theorem.” Accessed March 2017. https://en.wikipedia.org/wiki/Bohr–Mollerup_theorem.

———. n.d.-b. “Cauchy Distribution.” Accessed March 2017. https://en.wikipedia.org/wiki/Cauchy_distribution.

———. n.d.-c. “Generalized Logistic Distribution.” Accessed March 2017. https://en.wikipedia.org/wiki/Generalized_logistic_distribution.

———. n.d.-d. “Logistic Distribution.” Accessed March 2017. https://en.wikipedia.org/wiki/Logistic_distribution.

———. n.d.-e. “Polygamma Function.” Accessed March 2017. https://en.wikipedia.org/wiki/Polygamma_function.

———. n.d.-f. “Riemann Zeta Function.” Accessed March 2017. https://en.wikipedia.org/wiki/Riemann_zeta_function.

Appendices

Appendix A. The gamma, log-gamma, and polygamma functions

A.1. The gamma function as an integral

The modern form of the gamma function is Γ(α) = $\int_{x=0}^{\infty}$ e⁻^xx^α−1dx. The change of variable t = e⁻^x (or x = −ln t = ln 1/t) transforms it into:

$\Gamma(\alpha)=\int_{t=1}^{0} t\left(\ln \frac{1}{t}\right)^{\alpha-1}\left(-\frac{d t}{t}\right)=\int_{t=0}^{1}\left(\ln \frac{1}{t}\right)^{\alpha-1} d t$

According to Havil [2003, p. 53], Leonhard Euler used the latter form in his pioneering work during 1729–1730. It was not until 1809 that Legendre named it the gamma function and denoted it with the letter ‘Γ’. The function records the struggle between the exponential function e⁻^x and the power function x^α−1, in which the former ultimately prevails and forces the convergence of the integral for α > 0.

Of course, $\Gamma(1)=\int_{x=0}^{\infty} e^{-x} d x=1$ . The well-known recurrence formula Γ(α + 1) = αΓ(α) follows from integration by parts:

$\begin{aligned} \alpha \Gamma(\alpha) & =\alpha \int_{x=0}^{\infty} e^{-x} x^{\alpha-1} d x=\int_{x=0}^{\infty} e^{-x} d\left(x^{\alpha}\right)=\left.e^{-x} x^{\alpha}\right|_{0} ^{\infty} \\ & -\int_{x=0}^{\infty} x^{\alpha} d\left(e^{-x}\right)=0+\int_{x=0}^{\infty} e^{-x} x^{\alpha+1-1} d x=\Gamma(\alpha+1) \end{aligned}$

It is well to note that in order for e⁻^xx^α|₀^∞ to equal zero, α must be positive. For positive-integral values of α, Γ(α) = (α − 1)!; so the gamma function extends the factorial to the positive real numbers. In this domain the function is continuous, even differentiable, as well as positive. Although $\lim _{\alpha \rightarrow 0^{+}} \Gamma(\alpha)=\infty$ , $\lim _{\alpha \rightarrow 0^{+}} \alpha \Gamma(\alpha)=\lim _{\alpha \rightarrow 0} \Gamma(\alpha+1)=\Gamma(1)=1$ . So $\Gamma(\alpha) \sim \frac{1}{\alpha}$ as α → 0⁺. As α increases away from zero, the function decreases to a minimum of approximately Γ(1.462) = 0.886, beyond which it increases. So, over the positive real numbers the gamma function is ‘U’ shaped, or concave upward.

The simple algebra $1=\frac{\Gamma(\alpha)}{\Gamma(\alpha)}=\frac{\int_{x=0}^{\infty} e^{-x} x^{\alpha-1} d x}{\Gamma(\alpha)}=$ $\int_{x=0}^{\infty} \frac{1}{\Gamma(\alpha)} e^{-x} x^{\alpha-1} d x$ indicates that $f(x)=\frac{1}{\Gamma(\alpha)} e^{-x} x^{\alpha-1}$ is the probability density function of what we will call a Gamma(α, 1)-distributed random variable, or simply Gamma(α)-distributed with θ = 1 understood.^[16] For k > −α, the kth moment of X ∼ Gamma(α) is:

$\begin{aligned} E\left[X^{k}\right] & =\int_{x=0}^{\infty} x^{k} \frac{1}{\Gamma(\alpha)} e^{-x} x^{\alpha-1} d x \\ & =\frac{\Gamma(\alpha+k)}{\Gamma(\alpha)} \int_{x=0}^{\infty} \frac{1}{\Gamma(\alpha+k)} e^{-x} x^{\alpha+k-1} d x \\ & =\frac{\Gamma(\alpha+k)}{\Gamma(\alpha)} \end{aligned}$

Therefore, for k = 1 and 2, E[X] = α and E[X²] = α(α + 1). Hence, Var[X] = α.

A.2. The gamma function as an infinite product

Euler found that the gamma function could be expressed as the infinite product:

$\Gamma(x)=\lim _{n \rightarrow \infty} \frac{n!n^{x}}{x(1+x) \ldots(n+x)}$

$=\lim _{n \rightarrow \infty} \frac{n^{x}}{x\left(1+\frac{x}{1}\right) \ldots\left(1+\frac{x}{n}\right)}$

As a check, $\Gamma(1)=\lim _{n \rightarrow \infty} \frac{n!n}{(n+1)!}=\lim _{n \rightarrow \infty} \frac{n}{n+1}=1$ . Moreover, the recurrence formula is satisfied:

$\begin{aligned} \Gamma(x+1)= & \lim _{n \rightarrow \infty} \frac{n!n^{x+1}}{(1+x) \ldots(n+x)(n+1+x)} \\ = & x \lim _{n \rightarrow \infty} \frac{n!n^{x}}{x(1+x) \ldots(n+x)} \\ & \lim _{n \rightarrow \infty} \frac{n}{(n+1+x)}=x \Gamma(x) \end{aligned}$

Hence, by induction, the integral and infinite-product definitions are equivalent for positive-integral values of x. It is the purpose of this section to extend the equivalence to all positive real numbers.

The proof is easier in logarithms. The log-gamma function is $\ln \Gamma(\alpha)=\ln \int_{x=0}^{\infty} e^{-x} x^{\alpha-1} d x$ . The form of its recursive formula is lnΓ(α + 1) = ln(αΓ(α)) = lnΓ(α) + ln α. Its first derivative is:

$\begin{aligned} (\ln \circ \Gamma)^{\prime}(\alpha) & =\frac{d}{d \alpha} \ln \Gamma(\alpha) \equiv \frac{\Gamma^{\prime}(\alpha)}{\Gamma(\alpha)} \\ & =\frac{\int_{x=0}^{\infty} e^{-x} x^{\alpha-1} \ln x d x}{\Gamma(\alpha)} \\ & =\int_{x=0}^{\infty} \frac{1}{\Gamma(\alpha)} e^{-x} x^{\alpha-1} \ln x d x=E[Y] \end{aligned}$

So the first derivative of lnΓ(α) is the expectation of Y ∼ Log-Gamma(α), as defined in Section 2. The second derivative is:

$\begin{aligned} (\ln \circ \Gamma)^{\prime \prime}(\alpha) & =\frac{d}{d \alpha} \frac{\Gamma^{\prime}(\alpha)}{\Gamma(\alpha)} \\ & =\frac{\Gamma(\alpha) \Gamma^{\prime \prime}(\alpha)-\Gamma^{\prime}(\alpha) \Gamma^{\prime}(\alpha)}{\Gamma(\alpha) \Gamma(\alpha)} \end{aligned}$

$\begin{array}{l} =\frac{\Gamma^{\prime \prime}(\alpha)}{\Gamma(\alpha)}-\left(\frac{\Gamma^{\prime}(\alpha)}{\Gamma(\alpha)}\right)^{2} \\ =\int_{x=0}^{\infty} \frac{1}{\Gamma(\alpha)} e^{-x} x^{\alpha-1}(\ln x)^{2} d x-E[Y]^{2} \\ =E\left[Y^{2}\right]-E[Y]^{2}=\operatorname{Var}[Y]>0 \end{array}$

Because the second derivative is positive, the log-gamma function is concave upward over the positive real numbers.

Now let n ∈ {2, 3, . . .} and x ∈ (0, 1]. So 0 < n − 1 < n < n + x ≤ n + 1. We consider the slope of the log-gamma secants over the intervals [n – 1, n], [n, n + x], and [n, n + 1]. Due to the upward concavity, the order of these slopes must be:

$\begin{array}{c} \frac{\ln \Gamma(n)-\ln \Gamma(n-1)}{n-(n-1)}<\frac{\ln \Gamma(n+x)-\ln \Gamma(n)}{(n+x)-n} \\ \leq \frac{\ln \Gamma(n+1)-\ln \Gamma(n)}{(n+1)-n} \end{array}$

Using the recurrence formula and multiplying by positive x, we continue:

$\begin{array}{c} x \ln (n-1)<\ln \Gamma(n+x)-\ln \Gamma(n) \leq x \ln (n) \\ x \ln (n-1)+\ln \Gamma(n)<\ln \Gamma(n+x) \leq x \ln (n)+\ln \Gamma(n) \end{array}$

But lnΓ(n) = $\sum_{k=1}^{n-1} \ln k$ and ln Γ(n + x) = lnΓ(x) + ln x +

$\sum_{k=1}^{n-1} \ln (k+x)$ . Hence:

$\begin{aligned} \begin{aligned} x & \ln (n-1)+\sum_{k=1}^{n-1} \ln k \\ & < \end{aligned} & \ln \Gamma(x)+\ln x \\ & \quad+\sum_{k=1}^{n-1} \ln (k+x) \leq x \ln (n)+\sum_{k=1}^{n-1} \ln k \\ x \ln (n-1)+\sum_{k=1}^{n-1} \ln k & -\ln x-\sum_{k=1}^{n-1} \ln (k+x)<\ln \Gamma(x) \\ & \leq x \ln (n)+\sum_{k=1}^{n-1} \ln k-\ln x-\sum_{k=1}^{n-1} \ln (k+x) \end{aligned}$

$\begin{aligned} x \ln (n-1)-\ln x-\sum_{k=1}^{n-1} & \ln \left(1+\frac{x}{k}\right)<\ln \Gamma(x) \\ & \leq x \ln (n)-\ln x-\sum_{k=1}^{n-1} \ln \left(1+\frac{x}{k}\right) \end{aligned}$

This string of inequalities, in the middle of which is ln Γ(x), is true for n ∈{2, 3, . . .}. As n approaches infinity, $\lim _{n \rightarrow \infty}$ [x ln(n) – x ln(n − 1)] $=x \lim _{n \rightarrow \infty} \ln \left(\frac{n}{n-1}\right)$ = $x \ln \left(\lim _{n \rightarrow \infty} \frac{n}{n-1}\right)$ = xln1 = 0. So lnΓ(x) is sandwiched between two series that have the same limit. Hence:

$\begin{aligned} \ln \Gamma(x) & =-\ln x+\lim _{n \rightarrow \infty}\left\{x \ln (n-1)-\sum_{k=1}^{n-1} \ln \left(1+\frac{x}{k}\right)\right\} \\ & =-\ln x+\lim _{n \rightarrow \infty}\left\{x \ln (n)-\sum_{k=1}^{n} \ln \left(1+\frac{x}{k}\right)\right\} \end{aligned}$

If the limit did not converge, then lnΓ(x) would not converge. Nevertheless, Appendix C constructs a proof of the limit’s convergence. But for now we can finish this section by exponentiation:

$\begin{aligned} \Gamma(x) & =e^{\ln \Gamma(x)} \\ & =\frac{1}{x} e^{\lim _{n \rightarrow \infty}\left\{x \ln (n)-\sum_{k=1}^{n} \ln \left(1+\frac{x}{k}\right)\right\}} \\ & =\frac{1}{x} \lim _{n \rightarrow \infty}\left\{e^{x \ln (n)-\sum_{k=1}^{n} \ln \left(1+\frac{x}{k}\right)}\right\} \\ & =\frac{1}{x} \lim _{n \rightarrow \infty} \frac{n^{x}}{\prod_{k=1}^{n}\left(1+\frac{x}{k}\right)} \\ & =\lim _{n \rightarrow \infty} \frac{n^{x}}{x\left(1+\frac{x}{1}\right) \ldots\left(1+\frac{x}{n}\right)} \end{aligned}$

The infinite-product form converges for all complex numbers z ∉ {0, −1, −2, . . .}; it provides the analytic continuation to the integral form. ^[17] With the infinite-product form, Havil [2003, pp. 58f] easily derives the complement formula Γ(x)Γ(1 − x) = π/sin(πx). Consequently, Γ(½)Γ(½) = π/sin(π/2) = π.

A.3. Derivatives of the log-gamma function

The derivative of the gamma function is Γ′(a) = $\frac{d}{d \alpha} \int_{x=0}^{\infty} e^{-x} x^{\alpha-1} d x=\int_{x=0}^{\infty} e^{-x} \frac{d\left(x^{\alpha-1}\right)}{d \alpha} d x=\int_{x=0}^{\infty} e^{-x} x^{\alpha-1} \ln x d x$

So, for X ∼ Gamma(α):

$\begin{aligned} E[\ln X] & =\int_{x=0}^{\infty} \ln x \frac{1}{\Gamma(\alpha)} e^{-x} x^{\alpha-1} d x \\ & =\frac{1}{\Gamma(\alpha)} \int_{x=0}^{\infty} e^{-x} x^{\alpha-1} \ln x d x \\ & =\frac{\Gamma^{\prime}(\alpha)}{\Gamma(\alpha)}=\frac{d}{d \alpha} \ln \Gamma(\alpha) \end{aligned}$

Because the derivative of ln Γ(α) has proven useful, mathematicians have named it the “digamma” function: $\psi(\alpha)=\frac{d}{d \alpha} \ln \Gamma(\alpha)$ . Therefore, E[ln X] = ψ(α). Its derivatives are the trigamma, tetragamma, and so on. Unfortunately, Excel does not provide these functions, although one can approximate them from GAMMALN by numerical differentiation. The R and SAS programming languages have the trigamma function but stop at the second derivative of ln Γ(α). The remainder of this appendix deals with the derivatives of the log-gamma function.

According to the previous section:

$\ln \Gamma(x)=-\ln x+\lim _{n \rightarrow \infty}\left\{x \ln (n)-\sum_{k=1}^{n} \ln \left(1+\frac{x}{k}\right)\right\}$

Hence the digamma function is:

$\begin{aligned} \psi(x) & =\frac{d}{d x} \ln \Gamma(x)=-\frac{1}{x}+\lim _{n \rightarrow \infty}\left\{\ln n-\sum_{k=1}^{n} \frac{1}{k+x}\right\} \\ & =\lim _{n \rightarrow \infty}\left\{\ln n-\sum_{k=0}^{n} \frac{1}{k+x}\right\} \end{aligned}$

Its recurrence formula is:

$\begin{aligned} \psi(x+1) & =\lim _{n \rightarrow \infty}\left\{\ln n-\sum_{k=0}^{n} \frac{1}{k+1+x}\right\} \\ & =\lim _{n \rightarrow \infty}\left\{\ln n-\sum_{k=0}^{n} \frac{1}{k+x}\right\}+\frac{1}{0+x}=\psi(x)+\frac{1}{x} \end{aligned}$

Also, $\psi(1)=\lim _{n \rightarrow \infty}\left\{\ln n-\sum_{k=0}^{n} \frac{1}{k+1}\right\}=\lim _{n \rightarrow \infty}$ $\left\{\ln n-\sum_{k=1}^{n+1} \frac{1}{k}\right\}=-\lim _{n \rightarrow \infty}\left\{\sum_{k=1}^{n} \frac{1}{k}-\ln n\right\}=-\gamma$ . Euler was

the first to see that $\lim _{n \rightarrow \infty}\left\{\sum_{k=1}^{n} \frac{1}{k}-\ln n\right\}$ converged.^[18] He named it ‘C’ and estimated its value as 0.577218. Eventually it was called the Euler-Mascheroni constant γ, whose value to six places is really 0.577216.

The successive derivatives of the digamma function are:

$\begin{aligned} \psi^{\prime}(x) & =\frac{d}{d x} \lim _{n \rightarrow \infty}\left\{\ln n-\sum_{k=0}^{n} \frac{1}{k+x}\right\}=\sum_{k=0}^{\infty} \frac{1}{(k+x)^{2}} \\ \psi^{\prime \prime}(x) & =-2 \sum_{k=0}^{\infty} \frac{1}{(k+x)^{3}} \\ \psi^{\prime \prime \prime}(x) & =6 \sum_{k=0}^{\infty} \frac{1}{(k+x)^{4}} \end{aligned}$

The general formula for the nth derivative is ψ^[ⁿ^](x) = (−1)ⁿ⁻¹n! $\sum_{k=0}^{\infty} \frac{1}{(k+x)^{n+1}}$ with recurrence formula:

$\begin{aligned} \psi^{[n]}(x+1) & =(-1)^{n-1} n!\sum_{k=0}^{\infty} \frac{1}{(k+1+x)^{n+1}} \\ & =(-1)^{n-1} n!\sum_{k=0}^{\infty} \frac{1}{(k+x)^{n+1}}-\frac{(-1)^{n-1} n!}{x^{n+1}} \\ & =\psi^{[n]}(x)+\frac{(-1)^{n} n!}{x^{n+1}} \end{aligned}$

Finally, ψ^[ⁿ^](1) = (−1)ⁿ⁻¹n! $\sum_{k=0}^{\infty} \frac{1}{(k+1)^{n+1}}=(-1)^{n-1}$ $n!\sum_{k=1}^{\infty} \frac{1}{k^{n+1}}=(-1)^{n-1} n!\zeta(n+1)$ . A graph of the digamma through pentagamma functions appears in Wikipedia [Polygamma function].

Appendix B. Moments versus cumulants^[19]

Actuaries and statisticians are well aware of the moment generating function (mgf) of random variable $X$ :

$M_X(t)=E\left[e^{t X}\right]=\int_X e^{t x} f(x) d x$

Obviously, $M_X(0)=E[1]=1$ . It is called the mgf because its derivatives at zero, if convergent, equal the (non-central) moments of $X$ , i.e.:

$\begin{aligned} M_X^{[n]}(0) & =\left.\frac{d^n}{d t^n} E\left[e^{t X}\right]\right|_{t=0}=E\left[\left.\frac{d^n e^{t X}}{d t^n}\right|_{t=0}\right] \\ & =E\left[X^n e^{0 \cdot X}\right]=E\left[X^n\right] . \end{aligned}$

The mgf for the normal random variable is $M_{N\left(\mu, \sigma^2\right)}(t)=e^{\mu t+\sigma^2 t^2 / 2}$ . The mgf of a sum of independent random variables equals the product of their mgfs:

$\begin{aligned} M_{X+Y}(t) & =E\left[e^{t(X+Y)}\right]=E\left[e^{t X} e^{t Y}\right] \\ & =E\left[e^{t X}\right] E\left[e^{t Y}\right]=M_X(t) M_Y(t) \end{aligned}$

The cumulant generating function (cgf) of $X$ , or $\psi_X(t)$ is the natural logarithm of the moment generating function. So $\psi_X(0)=0$ ; and for independent summands:

$\begin{aligned} \psi_{X+Y}(t) & =\ln M_{X+Y}(t)=\ln M_X(t) M_Y(t) \\ & =\ln M_X(t)+\ln M_Y(t)=\psi_X(t)+\psi_Y(t) \end{aligned}$

The $n$ th cumulant of $X$ , or $\kappa_n(X)$ , is the $n$ th derivative of the cgf evaluated at zero. The first two cumulants are identical to the mean and the variance:

$\begin{aligned} \kappa_1(X)= & \left.\frac{d \ln M_X(t)}{d t}\right|_{t=0}=\left.\frac{M_X^{\prime}(t)}{M_X(t)}\right|_{t=0}=E[X] \\ \kappa_2(X) & =\left.\frac{d}{d t} \frac{M_X^{\prime}(t)}{M_X(t)}\right|_{t=0}=\left.\frac{-M_X^{\prime}(t) M_X^{\prime \prime}(t)}{M_X^2(t)}\right|_{t=0} ^{\prime}(t) \\ = & E\left[X^2\right]-E[X] E[X]=\operatorname{Var}[X] \end{aligned}$

One who performs the somewhat tedious third derivative will find that the third cumulant is identical to the skewness. So the first three cumulants are equal to the first three central moments. This amounts to a proof that the first three central moments are additive.

However, the fourth and higher cumulants do not equal their corresponding central moments. In fact, defining the central moments as $\mu_n=E\left[(X-\mu)^n\right]$ , the next three cumulant relations are:

$\begin{aligned} & \kappa_4=\mu_4-3 \mu_2^2=\mu_4-3 \sigma^4 \\ & \kappa_5=\mu_5-10 \mu_2 \mu_3 \\ & \kappa_6=\mu_6-15 \mu_2 \mu_4-10 \mu_3^2+30 \mu_2^3 \end{aligned}$

The cgf of a normal random variable is $\psi_{N\left(\mu, \sigma^2\right)}(t)= \ln e^{\mu t+\sigma^2 t / 2}=\mu t+\sigma^2 t^2 / 2$ . It first two cumulants equal $\mu$ and $\sigma^2$ , and its higher cumulants are zero. Therefore, the third and higher cumulants are relative to the zero values of the corresponding normal cumulants. So third and higher cumulants could be called “excess of the normal,” although this in practice is done only for the fourth cumulant. Because $\mu_4=E\left[(X-\mu)^4\right]$ is often called the kurtosis, ambiguity is resolved by calling $\kappa_4$ the “excess kurtosis,” or the kurtosis in excess of $\mu_4=E\left[N\left(0, \sigma^2\right)^4\right]=E\left[N(0,1)^4\right] \cdot \sigma^4=3 \sigma^4$ .

Appendix C. Log-gamma convergence

In Appendix A. 2 we proved that for real $\alpha>0$ :

$\begin{aligned} \ln \left(\int_{x=0}^{\infty} e^{-x} x^{\alpha-1} d x\right)= & \ln \Gamma(\alpha)=-\ln \alpha \\ & +\lim _{n \rightarrow \infty}\left\{\alpha \ln n-\sum_{k=1}^n \ln \left(1+\frac{\alpha}{k}\right)\right\} \end{aligned}$

There we argued that the “real-ness” of the left side for all $\alpha>0$ implies the convergence of the limit on the right side; otherwise, the equality would be limited or qualified. Nevertheless, it is valuable to examine the convergence of the log-gamma limit, as we will now do.

Define $a_n(x)=\frac{x}{n}-\ln \left(1+\frac{x}{n}\right)$ for natural number $n$ and real number $x$ . To avoid logarithms of negative numbers, $x>-1$ , or $x \in(-1, \infty)$ . The root of this group of functions is $f(x)=x-\ln (1+x)$ . Since the two functions $y=x$ and $y=\ln (1+x)$ are tangent to each other at $x=0$ , $f(0)=f^{\prime}(0)=0$ . Otherwise, the line is above the logarithm and $f(x \neq 0)>0$ . Since $f^{\prime \prime}(x)=1 /(1+x)^2$ must be positive, $f(x)$ is concave upward with a global minimum over $(-1, \infty)$ at $x=0$ .

For now, let us investigate the convergence of $\varphi(x)=\sum_{n=1}^{\infty} a_n(x)=\sum_{n=1}^{\infty}\left\{\frac{x}{n}-\ln \left(1+\frac{x}{n}\right)\right\}$ . Obviously, $\varphi(x=0)=0$ . But if $x \neq 0$ , for some $\xi \in(\min (n, n+x)$ , $\max (n, n+x)$ ):

$\begin{aligned} a_n(x) & =\frac{x}{n}-\ln \left(1+\frac{x}{n}\right) \\ & =\frac{x}{n}-\ln \left(\frac{n+x}{n}\right) \\ & =\frac{x}{n}-\int_{u=n}^{n+x} \frac{d u}{u} \\ & =\frac{x}{n}-((n+x)-n) \frac{1}{\xi} \\ & =\frac{x}{n}-\frac{x}{\xi} \\ & =x\left(\frac{1}{n}-\frac{1}{\xi}\right) \end{aligned}$

Regardless of the sign of $x$ :

$|x|\left|\left(\frac{1}{n}-\frac{1}{n}\right)\right|<\left|a_n(x)\right|<|x|\left|\left(\frac{1}{n}-\frac{1}{n+x}\right)\right|$

The lower bound reduces to zero; the upper to $|x|\left|\frac{x}{n(n+x)}\right|=\frac{|x|^2}{|n||n+x|}=\frac{x^2}{n|n+x|}$ . But $n+x$ must be positive, since $x>-1$ . Therefore, for $x \neq 0$ :

$0<\left|a_n(x)\right|<\frac{x^2}{n(n+x)}$

For $x=0$ , equality prevails. And so for all $x>-1$ :

$0 \leq\left|a_n(x)\right| \leq \frac{x^2}{n(n+x)}$

Now absolute convergence, or the convergence of $\sum_{n=1}^{\infty}\left|a_n(x)\right|$ , is stricter than simple convergence. If $\sum_{n=1}^{\infty}\left|a_n(x)\right|$ converges, then so too does $\varphi(x)=$ $\sum_{n=1}^{\infty} a_n(x)$ . And $\sum_{n=1}^{\infty}\left|a_n(x)\right|$ converges, if it is bounded from above. The following string of inequalities establishes the upper bound:

$\begin{aligned} \sum_{n=1}^{\infty}\left|a_n(x)\right| & \leq \sum_{n=1}^{\infty} \frac{x^2}{n(n+x)} \\ & \leq \frac{x^2}{1(1+x)}+\sum_{n=2}^{\infty} \frac{x^2}{n(n+x)} \\ & \leq \frac{x^2}{1+x}+x^2 \sum_{n=2}^{\infty} \frac{x^2}{n(n+x)} \\ & \leq \frac{x^2}{1+x}+x^2 \sum_{n=2}^{\infty} \frac{1}{(n-1)^2} \\ & \leq \frac{x^2}{1+x}+x^2\left(\frac{\pi^2}{6}=\varsigma(2)=\sum_{n=1}^{\infty} \frac{1}{n^2}\right) \end{aligned}$

Thus have we demonstrated the convergence of $\varphi(x)=\sum_{n=1}^{\infty} a_n(x)=\sum_{n=1}^{\infty}\left\{\frac{x}{n}-\ln \left(1+\frac{x}{n}\right)\right\}$ .

As a special case:^[20]

$\begin{aligned} \lim _{n \rightarrow \infty} \sum_{k=1}^n a_k(1) & =\lim _{n \rightarrow \infty} \sum_{k=1}^n\left\{\frac{1}{k}-\ln \left(1+\frac{1}{k}\right)\right\} \\ & =\lim _{n \rightarrow \infty} \sum_{k=1}^n\left\{\frac{1}{k}-\ln \left(\frac{k+1}{k}\right)\right\} \\ & =\lim _{n \rightarrow \infty}\left\{\sum_{k=1}^n \frac{1}{k}-\ln (n+1)\right\} \\ & =\lim _{n \rightarrow \infty}\left\{\sum_{k=1}^n \frac{1}{k}-\ln (n)\right\}=\gamma \approx 0.577 \end{aligned}$

Returning to the limit in the log-gamma formula and replacing ’ $\alpha$ ’ with ’ $x$ ', we have:

$\begin{aligned} & \lim _{n \rightarrow \infty}\left\{x \ln n-\sum_{k=1}^n \ln \left(1+\frac{x}{k}\right)\right\} \\ & \quad=\lim _{n \rightarrow \infty}\left\{x \ln n+\sum_{k=1}^n\left[-\frac{x}{k}+\frac{x}{k}-\ln \left(1+\frac{x}{k}\right)\right]\right\} \\ & \quad=\lim _{n \rightarrow \infty}\left\{x \ln n-\sum_{k=1}^n \frac{x}{k}\right\}+\sum_{k=1}^n\left\{\frac{x}{k}-\ln \left(1+\frac{x}{k}\right)\right\} \\ & \quad=-x \lim _{n \rightarrow \infty}\left\{\sum_{k=1}^n \frac{1}{k}-\ln (n)\right\}+\sum_{k=1}^n\left\{\frac{x}{k}-\ln \left(1+\frac{x}{k}\right)\right\} \\ & \quad=-\gamma x+\sum_{k=1}^n a_k(x) \end{aligned}$

Thus, independently of the integral definition, we have proven for $\alpha>0$ the convergence of $\ln \Gamma(\alpha)=-\ln \alpha+\lim _{n \rightarrow \infty}\left\{\alpha \ln n-\sum_{k=1}^n \ln \left(1+\frac{\alpha}{k}\right)\right\}$ .^[21]

This is but one example of a second-order convergence. To generalize, consider the convergence of $\varphi(x)=\lim _{n \rightarrow \infty} \sum_{k=1}^n f\left(\frac{x}{k}\right)$ . As with the specific $f(x)=$ $x-\ln (1+x)$ above, the general function $f$ must be analytic over an open interval about zero, $a<0<b$ . Likewise, its value and first derivative at zero must be analytic over an open interval about zero, $a<0<b$ . Likewise, its value and first derivative at zero must be zero, i.e., $f(0)=f^{\prime}(0)=0$ . According to the second-order mean value theorem, for $x \in(a, b)$ , there exists a real $\xi$ between zero and $x$ such that:

$f(x)=f(0)+f^{\prime}(0) x+\frac{f^{\prime \prime}(\xi)}{2} x^2=\frac{f^{\prime \prime}(\xi)}{2} x^2$

Therefore, there exist $\xi_k$ between zero and $\frac{x}{k}$ such that $\varphi(x)=\lim _{n \rightarrow \infty} \sum_{k=1}^n f\left(\frac{x}{k}\right)=\lim _{n \rightarrow \infty} \sum_{k=1}^n \frac{f^{\prime \prime}\left(\xi_k\right)}{2}\left(\frac{x}{k}\right)^2$ .

Again, we appeal to absolute convergence. If $\lim _{n \rightarrow \infty} \sum_{k=1}^n\left|\frac{f^{\prime \prime}\left(\xi_k\right)}{2}\left(\frac{x}{k}\right)^2\right|=\frac{x^2}{2} \lim _{n \rightarrow \infty} \sum_{k=1}^n \frac{\left|f^{\prime \prime}\left(\xi_k\right)\right|}{k^2}$ converges, then $\varphi(x)$ converges. But every $\xi_k$ belongs to the closed interval $\Xi$ between 0 and $x$ . And since $f$ is analytic over $(a, b) \supset \Xi, f^{\prime \prime}$ is continuous over $\Xi$ . Since a continuous function over a closed interval is bounded, $0 \leq\left|f^{\prime \prime}\left(\xi_k\right)\right| \leq M$ for some real $M$ . So, at length:

$\begin{aligned} 0 \leq \lim _{n \rightarrow \infty} \sum_{k=1}^n \frac{\left|f^{\prime \prime}\left(\xi_k\right)\right|}{k^2} \leq \lim _{n \rightarrow \infty} \sum_{k=1}^n \frac{M}{k^2} & =M \lim _{n \rightarrow \infty} \sum_{k=1}^n \frac{1}{k^2} \\ & =M \zeta(2)=M \frac{\pi^2}{6} \end{aligned}$

Consequently, $\lim _{n \rightarrow \infty} \sum_{k=1}^n f\left(\frac{x}{k}\right)$ converges to a real number for every function $f$ that is analytic over some domain about zero and for which $f(0)=f^{\prime}(0)=0$ . The convergence must be at least of the second order; a first-order convergence would involve $\lim _{n \rightarrow \infty} \sum_{k=1}^n \frac{M}{k}$ and the divergence of the harmonic series.

The same holds true for analytic functions over complex domains. Another potentially important secondorder convergence is $\varphi(z)=\lim _{n \rightarrow \infty} \sum_{k=1}^n\left\{e^{\frac{z}{k}}-\left(1+\frac{z}{k}\right)\right\}$ .

For explanations of the linear statistical model and BLUE see (1988) [1988, §§5.5 and 5.7] and Halliwell [2015]. Although normality is not assumed in the model, it is required for the usual tests of significance, the t-test and the f-test. It would divert us here to argue whether randomness arises from reality or from our ignorance thereof. Between the world wars physicists Niels Bohr and Albert Einstein argued this at length– technically to a stalemate, although most physicists give the decision to Bohr. Either way, actuaries earn their keep by dealing with what the insurance industry legitimately perceives as randomness (“risk”). One reviewer commented, “From a Bayesian perspective, there is no real concept of randomness in the sense of an outcome that is not the result of a cause-effect relationship.” This pushes conditional probability too far, to the tautology that Prob[X = a|X = b] = δ_ab. Information about some variables from a group of jointly distributed random variables can affect the probability distribution of the others, as in the classic case of the positive-definite multivariate normal distribution. Such information tightens or narrows the “wave function,” without altogether collapsing it. The reviewer’s comment presumes the notion of Einstein’s hidden variables. Bayes is neither here nor there, for a variable that remains hidden forever, or ontologically, really is random. The truth intended by the comment is that modelers should not be lazy, that subject to practical constraints they should incorporate all relevant information into their models, even the so-called “collateral” information. Judge et al. [1988, Chapter 7] devotes 50 pages to the Bayesian version of the LSM. Of course, whether randomness be ontological or epistemological, the goal of science is to mitigate it, if not to dispel it. This is especially the goal of actuarial science with regard to risk.
“Models that perform well even when the population does not conform precisely to the parametric family are said to be robust” (Klugman, Panjer, and Wilmot [1998, §2.4.3]). “A robust estimator . . . produces estimates that are ‘good’ (in some sense) under a wide variety of possible data-generating processes” (Judge et al. [1988, Introduction to Chapter 22]). Chapter 22 of Judge contains 30 pages on robust estimation.
For descriptions of bootstrapping see Klugman, Panjer, and Wilmot [1998, Example 2.19 in §2.4.3] and Judge et al. [1988, Chapter 9, §9.A.1]. However, they both describe bootstrapping, or resampling, from the empirical distribution of residuals. Especially when observations are few (as in the case of our example in Section 6 with 85 observations) might the modeler want to bootstrap/resample from a parametric error distribution. Not to give away “free cover” would require bootstrapping from a parametric distribution, unless predictive errors were felt to be well represented by the residuals. A recent CAS monograph that combines GLM and bootstrapping is Shapland [2016].
Appendix A treats the gamma function and its family of related functions.
Meyers [2013], also having seen the need for two-tailed loss distributions, is intrigued with skew normal and “mixed lognormal-normal” distributions, which in our opinion are not as intelligible and versatile as log-gamma distributions.
With Leemis [“Log-gamma distribution”], we use the ‘log’ prefix for Y = ln(Gamma), rather than for ln(Y) = Gamma. Hogg and Klugman [1984, j2.6] defines his “loggamma” as Y = e^Gamma, after analogy with the lognormal.
Moments, cumulants, and their generating functions are reviewed in Appendix B. The customary use of ‘ψ’ for both the cumulant generating function and the digamma function is unfortunate; however, the presence or absence of a subscripted random variable resolves the ambiguity.
See footnote 16 in Appendix A.1.
The same is true of all the ordinary, or non-central, moments. It is true also of the first three central moments, but only because they are identical to the first three cumulants. Halliwell [2011, §4] explains why the fourth and higher central moments of independent random variables are not additive.
Hogg and Klugman [1884, §2.6] derives this by the change-of-variable technique. Venter [1983, Appendix D] uses the mixing technique, since Gamma (α, 1)/ Gamma (β, 1) ∼ Gamma (α, θ = 1/Gamma (β, 1)). See also Klugman, Panjer, and Wilmot [1998, §2.7.3.4 and Appendix A.2.2.1]. Our formulation assumes a scale factor of unity in the generalized Pareto.
Workable Excel formulas are Z = LN (−1 + (1 − RAND( ))^(−1/β)) and Z = −LN(RAND( )^(−1/α) − 1). To simulate the generalized Pareto and logistic random variables requires gamma or beta inverse functions. Two equivalent forms are Z1 = LN(GAMMA.INV(RAND(), α, 1)) – LN(GAMMA.INV(RAND(), β, 1)) and Z2 = −LN(−1 + 1/BETA.INV(RAND(), α, β)). Mathematically, Z₁ ∼ Z₂; but the Z₂ formula overflows when RAND() ≈ 1.
See Wikipedia [Logistic distribution] and Wikipedia [Riemann zeta function]. Often cited is the coefficient of excess kurtosis: γ₂ = κ₄/κ₂² = (2π⁴/15)/(π²/3)² = 1.2. Havil [2003, Chapter 4] shows how Euler in 1735 proved that $\sum_{k=1}^{\infty} 1 / k^{2}$ converges to the value π²/6. Since the mid-1600s, determining the value of $\sum_{k=1}^{\infty} 1 / k^{2}$ had eluded Wallis, Leibniz, and the Bernoullis. Because the latter lived in Basel, it became known as the Basel problem.
Another derivation is: $\begin{aligned} W & =e^{Z / 2}=\sqrt{e^{Z}}=\sqrt{e^{\ln \left(\frac{\operatorname{Gamma}(1 / 2,1)}{\operatorname{Gamma}(1 / 2,1)}\right)}}=\sqrt{\frac{\operatorname{Gamma}(1 / 2,1)}{\operatorname{Gamma}(1 / 2,1)}} \\ & =\sqrt{\frac{\operatorname{Gamma}(1 / 2,2)}{\operatorname{Gamma}(1 / 2,2)}}=\sqrt{\frac{N(0,1)^{2}}{N(0,1)^{2}}}=\left|\frac{N(0,1)}{N(0,1)}\right| \end{aligned}$ The Gamma (½, 2) ∼ χ₁² random variable is the square of the standard normal. Wikipedia [Cauchy distribution] and Hogg and Klugman [1984, pp. 47, 49] show that the standard Cauchy random variable is the quotient of two independent standard normals.
The generic model of the observations is y = Xβ + e, where Var[e] = σ²Φ. The variance of the residuals is Var[ê] = Var[y − Xβ̂] = σ²(Φ − X(X′Φ^–1X)^–1X′), as derived by Halliwell [1996, Appendix D]. The standardized residual of the ith element is then ê*_i*/ $\sqrt{\operatorname{Var}[\hat{\mathbf{e}}]_{i i}}$ . Usually the estimate σ̂² must be used for σ².
Taking trigamma and tetragamma values from www.easycalculation.com/statistics/polygamma-function.php, we calculated the higher cumulants: κ₃ = Skew[Y] = 1.496 and κ₄ = XsKurt[Y] = 4.180. Since Y is standardized, these values double as coefficients of skewness and kurtosis. That they, especially the kurtosis, are greater than the empirical coefficients, 1.314 and 1.999, is evidence for the inferiority of the method of moments.
Similarly, Venter [1983, p. 157]: “The percentage of this integral reached by integrating up to some point x defines a probability distribution, i.e., the probability of being less than or equal to x.” The multiplication of this random variable by a scale factor θ > 0 is Gamma(α, θ)-distributed with density $f_{\operatorname{Gamma}(\alpha, \theta)}(x)=\frac{1}{\Gamma(\alpha)} e^{-\frac{x}{\theta}}\left(\frac{x}{\theta}\right)^{\alpha-1} \frac{1}{\theta}$ , as found under equivalent forms in Hogg and Klugman [1984, p. 226] and Klugman, Panjer, and Wilmot [1998, Appendix A.3.2.1]. Because this paper deals with logarithms of gamma random variables, in which products turn into sums, we will often ignore scale factors.
Here is an example of the increasing standard of mathematical rigor. Euler assumed that the agreement of the integral and the infinite product over the natural numbers implied their agreement over the positive reals. After all, the gamma function does not “serpentine” between integers, for it is concave. Only later was proof felt necessary, especially after Cantor demonstrated that ℵ₀ + ℵ₀ = ℵ₀. This implies that the agreement of two analytic functions at ℵ₀ points does not require the ℵ₀ derivatives in their Taylor series to be identical. (It would if ℵ₀ of those points were contained within a finite interval. But here the points of agreement are at least one unit distant from one another.) According to Wikipedia [Bohr–Mollerup theorem], when Harald Bohr and Johannes Mollerup published a proof of the integral/infinite-product equivalence in their 1922 textbook on complex analysis, they did not realize that their proof was original. Harald Bohr was the younger brother of Niels Bohr. A decade later, in the preface to his classic monograph on the gamma function, Emil Artin [2015 (1931)] acknowledged the fundamental nature of this theorem.
Reformulate it as: $\lim _{n \rightarrow \infty}\left\{\sum_{k=1}^{n} \frac{1}{k}-\ln n\right\}=\lim _{n \rightarrow \infty}\left\{\sum_{k=1}^{n} \frac{1}{k}-\ln (n+1)\right\}=$ $\lim _{n \rightarrow \infty}\left\{\sum_{k=1}^{n} \frac{1}{k}-\int_{x=k}^{k+1} \frac{d x}{x}\right\}=\sum_{k=1}^{\infty} \int_{x=k}^{k+1}\left(\frac{1}{k}-\frac{1}{x}\right) d x$ . The value of each unit-integral is positive but less than 1/k – 1/(k + 1) = 1/k(k + 1) < 1/k². So the series increases but is bounded by $\sum_{k=1}^{\infty} 1 / k^{2}=\zeta(2)=\pi^{2} / 6$ , where ζ is the zeta function [Wikipedia, Riemann zeta function]. See also Appendix C.
For moment generating functions see Hogg [1984, pp. 39f] and Klugman [1998, §3.9.1]. For cumulants and their generation see Daykin [1994, p. 23] and Halliwell [2011, §4].
See also Appendix A. 3 on the Euler-Mascheroni constant $\gamma$ .
Actually. we proved the convergence of the second term, the one with the limit, for $\alpha>-1$ . It is the first term, the logarithm, that contracts the domain of $\ln \Gamma(\alpha)$ to $\alpha>0$ . As a complex function $\ln \Gamma(z)$ converges for all complex $z \notin\{0,-1,-2, \ldots\}$ . However, $\ln z$ is multivalent, or unique only to an integral multiple of $2 \pi i$ . That exponentiation removes this multivalence is the reason why $e^{\ln \Gamma(z)}=\Gamma(z)$ is analytic, except at $z \notin$ $\{0,-1,-2, \ldots\}$ . The infinite product $\Gamma(z)=\lim _{n \rightarrow \infty} \frac{e^{z \ln n}}{z\left(1+\frac{z}{1}\right) \ldots\left(1+\frac{z}{n}\right)}$ converges, unless some factor in the denominator equals zero, i.e., unless $z=\{0,-1,-2, \ldots\}$ .

The Log-Gamma Distribution and Non-Normal Error

Abstract

1. Introduction

2. The log-gamma random variable

3. Weighted sums of log-gamma random variables

4. The generalized logistic random variable

5. Special cases

6. A maximum-likelihood example

7. Conclusion

References

Appendices

Appendix A. The gamma, log-gamma, and polygamma functions

A.1. The gamma function as an integral

A.2. The gamma function as an infinite product

A.3. Derivatives of the log-gamma function

Appendix B. Moments versus cumulants[19]

Appendix C. Log-gamma convergence

Appendix B. Moments versus cumulants^[19]