Moment-Based Approximation with Mixed Erlang Distributions

Hlne Cossette; David Landriault; Etienne Marceau; Khouzeima Moutanabbir

1. Introduction

Mixed Erlang distributions are known to yield analytic solutions to many risk management problems of interest. This is primarily due to the tractable features of this distributional class. Among others, the class of mixed Erlang distributions is closed under various operations such as convolutions and Esscher transformations (e.g., Willmot and Woo 2007 and Willmot and Lin 2011). As such, risk aggregation and ruin problems can more easily be tackled under mixed Erlang assumptions (e.g., Cheung and Woo 2016; Cossette, Mailhot, and Marceau 2012, and Landriault and Willmot 2009). Also, Tijms (1994) showed that the class of mixed Erlang distributions is dense in the set of all continuous and positive distributions. Therefore, we consider a moment-based approximation method which capitalizes on the aforementioned properties of the mixed Erlang distribution. More precisely, we propose to approximate a distribution with known moments by a moment-matching mixed Erlang distribution. Moment-based approximations have been extensively developed in various research areas, including performance evaluation, queueing theory, and risk theory, to name a few.

Osogami and Harchol-Balter (2006) identify the following four criteria to evaluate moment-matching algorithms: (1) the number of moments matched; (2) the computational efficiency of the algorithm; (3) the generality of the solution; and (4) the minimality of the number of parameters (phases). It also seems desirable for the approximation to be in itself a distribution. This is not mentioned in Osogami and Harchol-Balter (2006) for the obvious reason that they consider phase-type distributions as their moment-based approximation class. There exists an extensive literature on the approximation of distributions by a specific subset of phase-type distributions using moment-based techniques. For instance, Whitt (1982) proposed a mixture of two exponential distributions or a generalized Erlang distribution as a moment-based approximation when either the coefficient of variation (CV) is greater than or less than 1 , respectively. Also, both Altiok (1985) and Vanden Bosch, Dietz, and Pohl (2000) proposed an alternative to the
moment-based approximation of Whitt (1982) when $\mathrm{CV}>1$ using a Coxian distribution. Alternatively, Johnson and Taaffe (1989) considered a mixture of Erlangs with a common shape (order) parameter as their moment-based approximation.

Most predominantly, there exists a substantial body of literature on the three-moment approximation within the phase-type class of distributions (e.g., Telek and Heindl 2002; Bobbio, Horváth, and Telek 2005, and references therein). Matching the first three moments is often viewed as effective to provide a reasonable approximation to the underlying system (e.g., Osogami and Harchol-Balter 2006 and references therein). However, as illustrated in this paper and many others, three moments does not always suffice, triggering the development of more flexible moment-based approximations. Among others, we mention the work of Johnson and Taaffe (1989) on mixed Erlang distributions of common order. Also, Dufresne (2007) proposes two approximation techniques based on Jacobi polynomial expansions and the logbeta distribution to fit combinations of exponential distributions. This paper is complementary to the aforementioned ones by considering the family of finite mixture of Erlangs with common rate parameter to approximate a distribution on $\mathbb{R}^{+}$ , as theoretically justified in the continuous case by Tijms (1994, Theorem 3.9.1). The reader is also referred to S. C. Lee and Lin (2010) where fitting of the same class of distributions is considered using the EM algorithm (which relies on the knowledge of the approximated distribution rather than only its moments).

It is worth pointing out that other non-phase type approximation methods have been widely used in actuarial science. A good survey paper on this topic is Chaubey, Garrido, and Trudeau (1998). One of these approximation classes are refinements to the normal approximation such as the normal power and the Cornish Fisher approximations (e.g., Ramsay 1991; Daykin, Pentikäinen, and Pesonen 1994, and Y. S. Lee and Lin 1992). These approximations are based on the first few moments. However, the resulting approximation is often not a proper distribution. Other moment-based distributional approximations are the translated gamma distribution (e.g., Seal 1977), translated inverse Gaussian distribution (e.g., Chaubey, Garrido, and Trudeau 1998) and the generalized Pareto distribution (e.g., Venter 1983). It should be noted that all these approximation methods are designed to fit a specific number of moments and thus lack the flexibility to match an arbitrary number of moments.

The rest of the paper is constructed as follows. In Section 2, a brief review on admissible moments, mixed Erlang distributions and the approximation method of Johnson and Taaffe (1989) is provided. Section 3 is devoted to our class of finite mixture of Erlangs with common rate parameter. Theoretical and practical considerations related to the approximation method are drawn. Various examples are considered to examine the quality of the resulting approximation. In Section 4, we consider applications of our momentbased approximations of Section 3 when the underlying distribution is of mixed Erlang form with known rate parameter. A parallel is drawn with a discrete moment-matching problem and certain stochastic orderings, notably the $s$ -convex stochastic order (e.g., Denuit, Lefèvre, and Shaked 1998). An application of Cossette, Gaillardetz, and Marceau (2002) will be examined in more detail.

2. Background

2.1. Admissible moments

Karlin and Studden (1966) provide the necessary and sufficient conditions for a set of (raw) moments $\mu_{m}=\left(\mu_{1}, \ldots, \mu_{m}\right)$ to be from a probability distribution defined on $\mathbb{R}^{+}$ . To state this result, define the matrices $\mathbf{P}_{k}$ and $\mathbf{Q}_{k}(k \geq 1)$ as

$\begin{gathered} \mathbf{P}_{k}=\left(\begin{array}{cccc} 1 & \mu_{1} & \cdots & \mu_{k} \\ \mu_{1} & \mu_{2} & \cdots & \mu_{k+1} \\ \vdots & \vdots & \ddots & \vdots \\ \mu_{k} & \mu_{k+1} & \cdots & \mu_{2 k} \end{array}\right) ; \\ \mathbf{Q}_{k}=\left(\begin{array}{cccc} \mu_{1} & \mu_{2} & \cdots & \mu_{k+1} \\ \mu_{2} & \mu_{3} & \cdots & \mu_{k+2} \\ \vdots & \vdots & \ddots & \vdots \\ \mu_{k+1} & \mu_{k+2} & \cdots & \mu_{2 k+1} \end{array}\right) . \end{gathered}$

As stated in Courtois and Denuit (2007), there exists a non-negative random variable (rv) with distribution function (df) $F$ and first $m$ moments $\mu_{m}$ if and only if the following two conditions are satisfied:

$\operatorname{det} \mathbf{P}_{k}>0$ , for $k=1, \ldots,\lfloor(m-1) / 2\rfloor ;$
$\operatorname{det} \mathbf{Q}_{k}>0$ , for $k=1, \ldots,\lfloor m / 2\rfloor$ ;

where $\lfloor x\rfloor$ holds for the integer part of $x$ . In what follows, we silently assume the moment set $\boldsymbol{\mu}_{m}$ is from a probability distribution on $\mathbb{R}^{+}$ .

2.2. Mixed Erlang distribution

We now review some known properties of mixed Erlang distributions with common rate parameter. A more elaborate review of this class of distributions can be found in Willmot and Woo (2007), S. C. Lee and Lin (2010), and Willmot and Lin (2011).

Let $W$ be a mixed Erlang rv with common rate parameter $\beta>0$ and df

$F_{W}(x)=\sum_{k \in A_{l}} \zeta_{k} H(x ; k, \beta), \tag{1}$

where $A_{l}=\{1,2, \ldots, l\}$ , and $\left\{\zeta_{k}\right\}_{k=1}^{l}$ is the probability mass function (pmf) of a discrete rv $K$ with support $A_{l}$ for a given $l \in\{1,2, \ldots\} \cup\{\infty\}$ . The Erlang df $H$ is defined as

$\begin{gathered} H(x ; k, \beta) \equiv 1-\bar{H}(x ; k, \beta)=1-e^{-\beta x} \sum_{i=0}^{k-1} \frac{(\beta x)^{i}}{i!}, \\ x \geq 0, \end{gathered}\tag{2}$

where the parameters $k$ and $\beta$ of the Erlang df are known as the shape and rate parameters, respectively. An alternative and useful representation of the mixed Erlang rv $W$ is $W=\sum_{k=1}^{K} C_{k}$ where $\left\{C_{k}\right\}_{k \geq 1}$ are iid exponential rv’s with mean $1 / \beta$ , independent of $K$ , i.e., the rv $W$ follows a compound distribution.

Remark 1. As in, e.g., Willmot and Woo (2007), we consider the class of mixed Erlang dfs (1) rather than the more general class of combinations of Erlangs where some $\zeta_{k}$ 's are possibly negative. For the latter class, additional constraints on $\left\{\zeta_{k}\right\}_{k=1}^{l}$ exist to ensure that the right-hand side of (1) is a non-decreasing function in $x$ . This presents additional challenges in the subsequent moment-matching application, challenges which do not arise in the mixed Erlang case.

It is well known that the $j$ -th moment of $W$ is given by $E\left[W^{j}\right]=\beta^{-j} \sum_{k=1}^{\infty} \zeta_{k}\left\{\prod_{i=0}^{j-1}(k+i)\right\}$ . Of particular importance in actuarial science and quantitative risk management (see, e.g., McNeil, Frey, and Embrechts 2005 and references therein) are the VaR and TVaR risk measures. For the mixed Erlang rv $W$ , there is in general no closed form expression for $\operatorname{VaR}_{\mathrm{k}}(W)=\inf \left\{x \in \mathbb{R}: F_{W}(x) \geq \kappa\right\}$ where $0 \leq \kappa<1$ , but its value can be obtained using a routine numerical procedure. As for its TVaR, S. C. Lee and Lin (2010) showed that

$\begin{aligned} & \operatorname{TVaR}_{\mathrm{\kappa}}(W) \equiv \frac{1}{1-\kappa} \int_{\kappa}^{1} \operatorname{VaR}_{u}(W) d u \\ = & \frac{1}{1-\kappa} \sum_{k=1}^{\infty} \zeta_{k} \frac{k}{\beta} \bar{H}\left(\operatorname{VaR}_{\kappa}(W) ; k+1, \beta\right) . \end{aligned}\tag{3}$

Another quantity of interest is the stop-loss premium defined as $\pi_{W}(b)=E\left[(W-b)_{+}\right]$ with $(x)_{+}=\max \{x, 0\}$ . For the mixed Erlang df (1), we have

$\begin{gathered} \pi_{W}(b)=\frac{1}{1-\kappa} \sum_{k=1}^{\infty} \zeta_{k}\left(\frac{k}{\beta} \bar{H}(b ; k+1, \beta)-b \bar{H}(b ; k, \beta)\right), \\ b \geq 0 \end{gathered}$

(see also Willmot and Woo (2007, Eq. 3.6) for the higher-order stop-loss moments). Tijms (1994) showed that this class of distributions can approximate any continuous positive distribution with an arbitrary level of accuracy. For completeness, the theoretical foundation of this result is given next.

Theorem 2. (Tijms 1994, Theorem 3.9.1). Let F be the df of a positive rv. For any given $h>0$ , define

$F_{h}(x)=\sum_{k=1}^{\infty}(F(k h)-F((k-1) h)) H\left(x ; k, \frac{1}{h}\right) .\tag{4}$

Then, $\lim _{h \rightarrow 0} F_{h}(x)=F(x)$ for any continuity point $x$ of $F$ .
Note that $F_{h}$ in (4) is a mixed Erlang df of the form (1) with $\zeta_{k}=(F(k h)-F((k-1) h))(k=1,2, \ldots)$ and rate parameter $\beta=1 / \mathrm{h}$ .

Several approximation methods motivated by Tijms’ theorem were proposed over the years (see Section 1 for more details). In general, these moment-based approximations propose to work with a specific subclass of all finite and infinite mixed Erlang distributions. Among them, we recall the method of Johnson and Taaffe (1989), who will be used later for comparative purposes.

2.3. Method of Johnson and Taaffe (1989)

Johnson and Taaffe (1989) investigated the use of mixtures of Erlang distributions with common shape parameter for moment-matching purposes. More precisely, mixtures of $n$ (or fewer) Erlangs with common shape parameters are used to match the first $(2 n-1)$ moments (whenever the set of moments is within the feasible set). For the three-moment matching problem, Johnson and Taaffe (1989) generalized the approximation of Whitt (1982) and Altiok (1985) by enlarging the set of feasible moments $\mu_{3}$ when $C V>1$ . Their method is also valid for some combinations of $\mu_{3}$ when $C V<1$ .

Their three-moment approximation is a mixture of two Erlangs with common shape parameter $r$ (see Theorem 3 of Johnson and Taaffe 1989), i.e.,

$F(x)=p H\left(x ; r, \beta_{1}\right)+(1-p) H\left(x ; r, \beta_{2}\right),\tag{5}$

where $p=\left(\frac{\mu_{1}}{r}-\beta_{2}^{-1}\right) /\left(\beta_{1}^{-1}-\beta_{2}^{-1}\right)$ , and $\left\{1 / \beta_{i}\right\}_{i=1}^{2}$ are the solutions of $A s^{2}+B s+C=0$ with

$\begin{aligned} & A=r(r+2) \mu_{1}\left(\mu_{2}-\frac{r+1}{r} \mu_{1}^{2}\right) \\ & B=-\left(\begin{array}{c} r\left(\mu_{1} \mu_{3}-\frac{r+1}{r+2} \mu_{2}^{2}\right) \\ +\frac{r(r+2)}{r+1}\left(\mu_{2}-\frac{r+1}{r} \mu_{1}^{2}\right)^{2} \\ +(n+2) \mu_{1}^{2}\left(\mu_{2}-\frac{r+1}{r} \mu_{1}^{2}\right) \end{array}\right) \\ & C=\mu_{1}\left(\mu_{1} \mu_{3}-\frac{r+1}{r+2} \mu_{2}^{2}\right) . \end{aligned}$

The choice of the shape parameter $r$ is discussed in Johnson and Taaffe (1989, Proposition 4).

3. Moment-based approximation with mixed Erlang distribution

In this section, we propose to use a different subclass of mixed Erlang distributions to examine momentbased approximation techniques.

3.1. Description of the approach

For a given $l \in \mathbb{N}^{+}$ , let $\mathcal{M E}\left(\boldsymbol{\mu}_{m}, A_{l}\right)$ be the set of all finite mixture of Erlangs with df (1) and first $m$ moments $\mu_{m}$ . From Section 2.2, this consists in the identification of all solutions to the problem

$\sum_{k=1}^{l} \zeta_{k} \frac{\prod_{i=0}^{j-1}(k+i)}{\beta^{j}}=\mu_{j}, \quad j=1, \ldots, m,\tag{6}$

under the constraints that $\beta>0$ and $\left\{\zeta_{k}\right\}_{k=1}^{l}$ is a probability measure on $A_{l}$ .

Remark 3. For a $r v W$ with $d f$ (1) and first $m$ moments $\mu_{m}$ , we indifferently write $W \in \mathcal{M} \mathcal{E}\left(\mu_{m}, A_{l}\right)$ or $F_{W} \in \mathcal{M E}\left(\mu_{m}, A_{l}\right)$ . This will also apply to the other distributional classes.

Also, let $\mathcal{M E}^{\text {res }}\left(\mu_{m}, A_{l}\right)$ be the (restricted) subset of $\mathcal{M E}\left(\boldsymbol{\mu}_{m}, A_{l}\right)$ with at most $m$ non-zero mixing probabilities $\left\{\zeta_{k}\right\}_{k=1}^{l}$ . Given that $\mathcal{M E} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l}\right)$ has a finite number of solutions, we propose to use $\mathcal{M E}{\mathstrut}^{\text {res }}\left(\mu_{m}, A_{l}\right)$ as our approximation class. It is clear that $\mathcal{M E} \mathcal{E}^{\text {res }}\left(\mu_{m}, A_{l}\right)$ $\subseteq \mathcal{M} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l^{\prime}}\right)$ for $l \leq l^{\prime}$ .

Note that for a continuous positive distribution with moments $\mu_{m}$ , we know from Theorem 2 that there exists a $l$ large enough such that $\mathcal{M E}\left(\mu_{m}, A_{l}\right)$ is not empty. Even though no formal conclusion can be reached for the restricted class $\mathcal{M E} \mathcal{E}^{\text {res }}\left(\mu_{m}, A_{l}\right)$ , all our numerical studies have shown that this set has a large number of distributions (see, for instance, the examples of subsections 3.2.1 and 3.2.2) for a given $m$ when $l$ is chosen large enough.

Distributions in the $\mathcal{M} \mathcal{E}^{\text {res }}\left(\mu_{m}, A_{l}\right)$ class are identified as follows: for a given set $\left\{i_{k}\right\}_{k=1}^{m} \subset A_{l}$ with $1 \leq i_{1}<$ $i_{2}<\cdots<i_{m} \leq l$ , (6) can be rewritten in matrix form as

$\mathbf{G}_{m} \zeta_{m}=\mathbf{M}_{\beta},$

where $\zeta_{m}=\left(\zeta_{i_{1}}, \zeta_{i_{2}}, \ldots, \zeta_{i_{m}}\right)^{T}, \mathbf{M}_{\beta}=\left(\beta \mu_{1}, \beta^{2} \mu_{2}, \ldots\right.$ , $\left.\beta^{m} \mu_{m}\right)^{T}$ , and

$\mathbf{G}_{m}=\left(\begin{array}{cccc} i_{1} & i_{2} & \cdots & i_{m} \\ i_{1}\left(i_{1}+1\right) & i_{2}\left(i_{2}+1\right) & \cdots & i_{m}\left(i_{m}+1\right) \\ \vdots & \vdots & \ddots & \vdots \\ \prod_{i=0}^{m-1}\left(i_{1}+i\right) & \prod_{i=0}^{m-1}\left(i_{2}+i\right) & \cdots & \prod_{i=0}^{m-1}\left(i_{m}+i\right) \end{array}\right) .$

It follows that $\zeta_{m}=\mathbf{G}_{m}^{-1} \mathbf{M}_{\beta}$ under the constraint that $\zeta_{m} \mathbf{e}=1$ , where $\mathbf{e}$ is the vector 1’s. Note that $\zeta_{m}^{T} \mathbf{e}=\left(\mathbf{G}_{m}^{-1} \mathbf{M}_{\beta}\right)^{T} \mathbf{e}$ is a polynomial of degree (at most) $m$ in $\beta$ . Thus, we only consider the real and positive solutions (in $\beta$ ) of $\left(\mathbf{G}_{m}^{-1} \mathbf{M}_{\beta}\right)^{T} \mathbf{e}=1$ and complete their mixed Erlang representation with the identification of the mixing weights $\zeta_{m}=\mathbf{G}_{m}^{-1} \mathbf{M}_{\beta}$ . The procedure is systematically repeated for all $\binom{l}{m}$ possible sets of $m$ distinct elements in $A_{l}$ .

Remark 4. Given that the above procedure is repeated $\binom{l}{m}$ times, the computational efficiency of the proposed methodology is mostly driven by this number, and hence the parameters $m$ and $l$ should be chosen accordingly. For a given number of moments, $m$ , we observe that: (a) larger values of l result in a more time-consuming numerical procedure; (b) however, $l$ should be chosen large enough for the approximation class $\mathcal{M} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l}\right)$ to have a reasonable number of members (to legitimally produce a “good” approximation). From our numerical studies, we observe that the selection of $l$ (for a given $m$ ) can be problemspecific, and thus this tradeoff in the choice of l should be handled with care. However, as a rule of thumb, when $m$ is relatively small (i.e., $m \leq 6$ ) which is traditionally in moment-matching exercises, a value of $l$ between 50 and 100 leads to reasonable mixed Erlang approximations. We refer the reader to the numerical illustrations and subsequent remarks in Section 3.2 for a more detailed discussion on this topic.

Among all $\mathcal{M} \mathcal{E}^{r e s}\left(\boldsymbol{\mu}_{m}, A_{l}\right)$ distributions, we propose to use as a criterion of the quality of these approximations the Kolmogorov-Smirnov (KS) distance defined for two rv’s $S$ and $W$ (with respective dfs $F_{S}$ and $F_{W}$ )
as $d_{K S}(S, W)=\sup _{x \geq 0}\left|F_{S}(x)-F_{W}(x)\right|$ . The KS distance is commonly used in the context of continuous distributions (e.g., Denuit et al. 2005). Therefore, the chosen mixed Erlang approximation within $\mathcal{M E} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l}\right)$ is the one minimizing the KS distance with the true df $F_{S}$ . We denote by $F_{W_{m, 1}}$ this approximation, i.e.,

$d_{K S}\left(S, W_{m, l}\right)=\inf _{F_{W} \in \mathcal{M} \mathcal{E}^{\prime \prime s}\left(\mu_{m}, A_{l}\right)} \sup _{x \geq 0}\left|F_{S}(x)-F_{W}(x)\right|,$

where $W_{m, l}$ is a rv with df $F_{W_{m, l}}$ . This requires the calculation of the KS distance for each mixed Erlang distribution in $\mathcal{M} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l}\right)$ to identify its minimizer $W_{m, l}$ . In general, an explicit expression for this KS distance does not exist, and hence we propose to numerically find this value by evaluating the distance between the two dfs over all multiples (up to a given high value) of a small discretization span.

Note that other distances such as the stop-loss distance (e.g., Gerber 1979) could have been used to select our approximation distribution. Alternatively, one could have relied on another criterion to identify this approximation distribution (for instance, select the distribution in $\mathcal{M E}{\mathstrut}^{r e s}\left(\mu_{m}, A_{l}\right)$ with the closest subsequent moment to the true distribution).

3.2. Numerical examples

We consider a few simple examples to illustrate the quality of the approximation. For comparative purposes, other approximation methods will also be discussed. Some concluding remarks on the mixed Erlang approximation method are later made based on the numerical experiment conducted next.

3.2.1. Lognormal distribution: Dufresne (2007, Example 5.4)

Let $S=\exp (Z)$ where $Z$ is a normal rv with mean 0 and variance 0.25 . The first 5 moments of $S$ are $\mu_{5}=(1.1331,1.6487,3.0802,7.3891,22.7599)$ . We consider the class of mixed Erlang distributions $\mathcal{M E} \mathcal{E}^{\text {res }}\left(\mu_{m}\right.$ , $\left.A_{70}\right)(m=3,4,5)$ which have a total of 13198,89294 and 290422 distributions, respectively. The resulting mixed Erlang approximations $F_{W_{m, 70}}(m=3,4,5)$ are

$\begin{aligned} F_{W_{3,70}}(x)= & 0.8209 H(x ; 6,6.3219) \\ & +0.1727 H(x ; 12,6.3219) \\ & +0.0064 H(x ; 26,6.3219), \\ F_{W_{4}, 70}(x)= & 0.6350 H(x ; 7,8.3334) \\ & +0.2950 H(x ; 12,8.3334) \\ & +0.0672 H(x ; 20,8.3334) \\ & +0.0029 H(x ; 40,8.3334), \end{aligned}$

and

$\begin{aligned} F_{W_{5,70}}(x)= & 0.6273 H(x ; 7,8.3608) \\ & +0.3063 H(x ; 12,8.3608) \\ & +0.0609 H(x ; 20,8.3608) \\ & +0.0055 H(x ; 34,8.3608) \\ & +0.0001 H(x ; 69,8.3608), \end{aligned}$

for $x \geq 0$ with respective KS distances of $d_{K S}\left(S, W_{3,70}\right)$ $=0.0040, d_{K S}\left(S, W_{4,70}\right)=0.0018$ and $d_{K S}\left(S, W_{5,70}\right)=$ 0.0011 . Note that the quality of the mixed Erlang approximation (as measured by the KS distance) increases with the number of moments matched. For comparative purposes, the three-moment approximation (5) of Johnson and Taaffe (1989) is given by

$\begin{aligned} F_{W_{J T}}(x)= & 0.0087 H(x ; 4,1.2804) \\ & +0.9913 H(x ; 4,3.5855), \end{aligned}$

In Figure 1, we compare the density function of $W_{m, 70}$ $(m=3,4,5) W_{J T}$ and $S$ .

Figure 1.Density function: Lognormal vs. Approximations.

All three mixed Erlang approximations provide an overall good fit to the exact distribution. To further examine the tail fit, specific values of VaR and TVaR for the exact and approximated distributions are provided in Tables 1 and 2, respectively.

Table 1.Values of

$\operatorname{VaR}_{\mathrm{x}}$ for

$W_\pi, W_{m, 70}(m=3,4,5)$ and

$S$

$\kappa$	$\operatorname{VaR}_\kappa\left(W_{J T}\right)$	$\operatorname{VaR}_{\mathrm{k}}\left(W_{3,70}\right)$	$\operatorname{VaR}_{\mathrm{k}}\left(W_{4,70}\right)$	$\operatorname{VaR}_{\mathrm{k}}\left(W_{5,70}\right)$	$\operatorname{VaR}_\kappa(S)$
0.9	1.8906	1.9129	1.9056	1.8936	1.8980
0.95	2.2119	2.2692	2.2918	2.2791	2.2760
0.99	2.9953	3.1223	3.0991	3.1812	3.2001
0.995	3.4239	3.6892	3.4811	3.6572	3.6252
0.999	5.0672	4.9237	5.0623	4.7241	4.6885

Table 2.Values of

$\mathrm{TVaR}_{\mathrm{k}}$ for

$W_\pi, W_{m, 70}(m=3,4,5)$ and

$S$

$\kappa$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_\pi\right)$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_{3,70}\right)$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_{4,70}\right)$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_{5,70}\right)$	$\operatorname{TVaR}_{\mathrm{k}}(S)$
0.9	2.3931	2.4540	2.4601	2.4616	2.4616
0.95	2.7528	2.8350	2.8431	2.8600	2.8586
0.99	3.7939	3.9007	3.8125	3.8579	3.8413
0.995	4.4115	4.4455	4.3654	4.3323	4.2957
0.999	6.2260	5.4245	5.6195	5.3151	5.4341

We observe that the VaR and TVaR values of the mixed Erlang approximations compare very well to their lognormal counterparts, especially for the 5-moment approximation. This is particularly true given that the lognormal distribution is known to have a heavier tail than the mixed Erlang distribution.

Note that the improvement is indeed not monotone with the number of moments matched, as increasing this number does not necessarily lead to a higher quality approximation in moment-matching techniques.

3.2.2. Mixture of two gamma distributions: S. C. Lee and Lin (2010, Section 5, Example 1)

Let $S$ be a mixture of two gamma distributions with density

$\begin{aligned} f_{s}(s)= & 0.2 \frac{(3.2 s)^{2.6} e^{-3.2 s}}{s \Gamma(2.6)} \\ & +0.8 \frac{(1.2 s)^{6.3} e^{-1.2 s}}{s \Gamma(6.3)}, \quad s \geq 0, \end{aligned}$

and first 6 moments $\mu_{6}=(4.3623,25.7308,176.9624$ , 1369.8272, 11754.2149, 110674.4154). We consider the class of mixed Erlang distributions $\mathcal{M} \mathcal{E}^{r e s}\left(\mu_{m}, A_{70}\right)$

for $m=3,4,5,6$ which are composed of 16000,83797 , 494532 and 1928919 distributions, respectively. The resulting mixed Erlang approximations $F_{W_{m, 70}}(m=3,4$ , $5,6)$ are

$\begin{aligned} F_{W_{3,70}}(x)= & 0.2140 H(x ; 2,2.0835) \\ & +0.5215 H(x ; 9,2.0835) \\ & +0.2645 H(x ; 15,2.0835), \\ F_{W_{4,70}}(x)= & 0.2266 H(x ; 2,2.0469) \\ & +0.4440 H(x ; 9,2.0469) \\ & +0.2963 H(x ; 13,2.0469) \\ & +0.0330 H(x ; 19,2.0469), \\ F_{W_{5,70}}(x)= & 0.2023 H(x ; 3,3.7271) \\ & +0.2091 H(x ; 12,3.7271) \\ & +0.3936 H(x ; 19,3.7271) \\ & +0.1805 H(x ; 28,3.7271) \\ & +0.0145 H(x ; 42,3.7271), \end{aligned}$

and

$\begin{aligned} F_{W_{6,70}}(x)&= 0.0768 H(x ; 2,3.0731) \\ &\quad +0.1325 H(x ; 3,3.0731) \\ &\quad +0.2739 H(x ; 11,3.0731) \\ &\quad +0.3928 H(x ; 17,3.0731) \\ &\quad +0.1188 H(x ; 25,3.0731) \\ &\quad +0.0052 H(x ; 37,3.0731), \end{aligned}$

with respective KS distances of $d_{K S}\left(S, W_{3,70}\right)=0.0148$ , $d_{K S}\left(S, W_{4,70}\right)=0.0050, d_{K S}\left(S, W_{5,70}\right)=0.0035$ and $d_{K S}\left(S, W_{6,70}\right)=0.0024$ . S. C. Lee and Lin (2010) used the EM algorithm to fit a mixed Erlang distribution to the same distribution, which resulted in the following model:

$\begin{aligned} F_{W_{E M}}(x)= & 0.2282 H(x ; 2,1.9603) \\ & +0.5430 H(x ; 9,1.9603) \\ & +0.2288 H(x ; 14,1.9603), \end{aligned}$

with KS distance $d_{K S}\left(S, W_{E M}\right)=0.0094$ . Note that the KS distance for the 3-moment approximation is greater than the KS distance obtained using the EM estimation. We recall that the EM algorithm finds maximum likelihood estimates using the approximated distribution as an input, while our method is based on only partial information on the approximated distribution (e.g., its first $m$ moments). We observe that the fit improves and the KS distance decreases when more moments are included in the approximation. For illustrative purposes, we also provide in Tables 3 and 4 some values of VaR and TVaR for $W_{m, 70}(m=3,4,5,6)$ , $W_{E M}$ and $S$ .

Table 3.Values of

$\operatorname{VaR}_{\mathrm{x}}$ for

$W_{m, 70}(m=3,4,5,6), W_{E M}$ , and

$S$

$\kappa$	$\operatorname{VaR}_\kappa\left(W_{E M}\right)$	$\operatorname{VaR}_\kappa\left(W_{3,70}\right)$	$\operatorname{VaR}_\kappa\left(W_{4,70}\right)$	$\operatorname{VaR}_\kappa\left(W_{5,70}\right)$	$\operatorname{VaR}_\kappa\left(W_{6,70}\right)$	$\operatorname{VaR}_\kappa(S)$
0.9	7.7069	7.8183	7.6726	7.6943	7.6969	7.6859
0.95	8.7494	8.8676	8.7595	8.7172	8.7825	8.7666
0.99	10.7708	10.8451	11.0385	11.0462	10.9514	11.0023
0.995	11.5349	11.5835	11.9366	12.0611	11.8566	11.8925
0.999	13.1604	13.1473	13.8488	13.9701	13.9651	13.8551

Table 4.Values of

$\mathrm{TVaR}_{\mathrm{x}}$ for

$W_{m, 70}(m=3,4,5,6), W_{E M}$ and

$S$

$\kappa$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_{E M}\right)$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_{3,70}\right)$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_{4,70}\right)$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_{5,70}\right)$	$\operatorname{TVaR}_{\mathrm{k}}\left(W_{6,70}\right)$	$\operatorname{TVaR}_{\mathrm{k}}(S)$
0.9	9.0866	9.1909	9.1596	9.1404	9.1634	9.1598
0.95	9.9929	10.0845	10.1583	10.1234	10.1378	10.1469
0.99	11.8253	11.8620	12.2766	12.3665	12.2440	12.2528
0.995	12.5377	12.5481	13.1132	13.2302	13.1371	13.1065
0.999	14.0774	14.0266	14.9062	14.8770	15.0978	15.0069

3.2.3. Gompertz distribution

The Gompertz distribution with df $F_{S}(x)=1-$ $\exp \left\{-B\left(c^{x}-1\right) / \ln c\right\}(x>0)$ has been extensively applied in various life contingency contexts (e.g., Bowers et al. 1997). Lenart (2014) provides an expression for the $j$ -th moment, namely

$\mu_{j}=\frac{j!}{(\ln c)^{j}}{\mathstrut}^{\frac{B}{e^{\ln }}} E_{1}^{j-1}\left(\frac{B}{\ln c}\right)\tag{7},$

where $E_{s}^{j}(z)=\int_{1}^{\infty} \frac{(\ln x)^{j}}{j!} x^{-s} e^{-z x} d x$ is the generalized integro-exponential function (see Milgram 1985). Here, we consider an example of Melnikov and Romaniuk (2006) on the 1959-1999 USA mortality data of the human mortality database where the parameters $B$ and $c$ were estimated to $B=6.148 \times 10^{-5}$ and $c=1.09159$ . Using (7), the first five moments are $\mu_{5}=(76.3437$ , 6037.202, 489676.3, 40524308, 3410245408). We consider the class of mixed Erlang distributions $\mathcal{M} \mathcal{E}^{\text {res }}\left(\mu_{m}, A_{90}\right)$ . The resulting mixed Erlang approximations $F_{W_{3,90}}$ and $F_{W_{490}}$ are

$\begin{aligned} F_{W_{3,90}}(x)= & 0.0154 H(x ; 22,1.0928) \\ & +0.2210 H(x ; 65,1.0928) \\ & +0.7637 H(x ; 90,1.0928) \end{aligned}$

and

$\begin{aligned} F_{W_{4}, 90} & (x)= \\ & 0.0347 H(x ; 35,1.0972) \\ & +0.0834 H(x ; 66,1.0972) \\ & +0.1009 H(x ; 67,1.0972) \\ & +0.7810 H(x ; 90,1.0972), \end{aligned}$

with KS distances $d_{K S}\left(S, W_{3,90}\right)=0.0188$ , and $d_{K S}\left(S, W_{4,90}\right)$ $=0.0239$ . In Figure 2, we compare the fit of the 3 - and 4 -moment approximations to the exact distribution by plotting their densities (left) and dfs (right). Overall, we observe that the fit is quite reasonable.

Figure 2.Density (left) and df (right): 3- and 4-moment approximations vs. Gompertz distribution.

Note that the KS distance increases from the 3 -moment to the 4-moment approximation. We notice that both $F_{W_{3,9}}$ and $F_{W_{4,90}}$ use the Erlang-90 df, where 90 is the largest element of $A_{90}$ . As such, a mixed Erlang approximation with a smaller KS distance can likely be found in both cases by choosing a larger support $A_{l}$ (i.e., $l>90$ ).

3.2.4. Some remarks

To provide insight on the quality of the momentbased mixed Erlang approximation proposed in this section, we briefly revisit the results of the above three examples. In the first two (lognormal and mixture of two gammas), the mixed Erlang approximation is easy to implement and provides a very satisfactory fit to the true distribution. As none of the resulting mixed Erlang approximations uses the Erlang-70 df (as $l=70$ in the first two examples), it is unlikely that a better approximation (from the viewpoint of KS distance) can be found by increasing the value of $l$ . Given that the best KS-fit is found from a mixture of Erlang distributions with relatively small shape parameter (which corresponds to the parameter $k$ in the Erlang df (2)), the proposed method seems particularly well suited for these two cases.

As for any approximation method, limitations can also be found, as evidenced by the Gompertz example. Indeed, for this example, both the 3-moment and 4-moment mixed Erlang approximations use the Erlang-90 df (recall $l=90$ for this example). As mentioned earlier, one can likely reduce the KS distance (if so desired) of the resulting mixed Erlang approximation by increasing the value of $l$ (in light of the comments in Remark 4). This implies that the KS-optimal mixed Erlang approximation would likely involve Erlang distributions with large shape parameters (which have smaller variances for a given rate parameter $\beta$ ).

In general, distributions with negative skewness and sharp density peak(s) may require the use of Erlang distributions with large shape parameters to provide a good approximation. Computational time of the proposed mixed Erlang methodology may become a non-negligible issue in these cases (especially as the number of moments matched increases). However, a slight adjustment to the proposed methodology may be considered to address this time-consuming issue. Indeed, one may replace the set $A_{l}=\{1,2, \ldots, l\}$ in (1) by a set of the form $\{a, 2 a, \ldots, j a\}$ for positive integers $a$ and $j$ (note that the two sets coincide when $a=1$ and $j=l$ ). To illustrate this, we have reconsidered the Gompertz example by replacing the set $A_{90}$ by the set $\{5,10,15, \ldots, 200\}$ . The resulting mixed Erlang approximation, denoted by the rv $W_{m, 5: 200}$ when $m$ moments are matched ( $m=3,4,5$ ), are given by

$\begin{aligned} F_{W_{3,5 ; 20}}(x)= & 0.0447 H(x ; 45,1.3185) \\ & +0.2573 H(x ; 85,1.3185) \\ & +0.6980 H(x ; 110,1.3185), \end{aligned}$

$\begin{aligned} F_{W_{4,5: 200}}(x)= & 0.0168 H(x ; 40,1.6422) \\ & +0.0971 H(x ; 85,1.6422) \\ & +0.3044 H(x ; 115,1.6422) \\ & +0.5818 H(x ; 140,1.6422), \end{aligned}$

and

$\begin{aligned} F_{W_{5,5: 200}}(x)= & 0.0038 H(x ; 20,1.9095) \\ & +0.0393 H(x ; 75,1.9095) \\ & +0.1262 H(x ; 110,1.9095) \\ & +0.3275 H(x ; 140,1.9095) \\ & +0.5032 H(x ; 165,1.9095), \end{aligned}$

with KS distances $d_{K S}\left(S, W_{3,5: 200}\right)=0.0171, d_{K S}\left(S, W_{4,5: 200}\right)$ $=0.0086$ , and $d_{K S}\left(S, W_{5,5: 200}\right)=0.0072$ . Note that the KS distance in the 4-moment approximation is considerably lower than for $d_{K S}\left(S, W_{4,90}\right)=0.0239$ . In Figure 3, we compare the fit of the 5-moment approximation $W_{5,5: 200}$ to the Gompertz distribution by plotting their densities (left) and dfs (right). We can see that the fit is quite acceptable and of a better quality than the two approximations displayed in Figure 3.

Figure 3.Density (left) and df (right): 5-moment approximation vs Gompertz distribution.

4. Moment-based mixed Erlang approximation with known $\beta$

4.1. Basic definitions

We consider here a slightly different context than the one of Section 3. Instead of approximating a general df $F_{S}$ with known moments $\mu_{m}$ , we assume that the df $F_{S}$ is known to be of mixed Erlang form (1) with given rate parameter $\beta>0$ and first $m$ moments $\mu_{m}$ . However, the mixing weights $\left\{\zeta_{k}\right\}_{k=1}^{l}$ are assumed unknown or difficult to obtain. Various applications in risk theory and credit risk fall into this context (e.g., Cossette, Gaillardetz, and Marceau 2002; Lindskog and McNeil 2003, and McNeil, Frey, and Embrechts 2005; see also the application of Section 4.4).

For a given rate parameter $\beta>0$ , let $\mathcal{M E}\left(\mu_{m}, \beta\right)$ be the set of all mixed Erlang distributions with df (1) (as $l \rightarrow \infty)$ and first $m$ moments $\mu_{m}$ . Also, define $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ to be the subset of $\mathcal{M E}\left(\mu_{m}, \beta\right)$ with df (1) for a given $l \in \mathbb{N}^{+}$ , and let $\mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, A_{l}, \beta\right)$ be a further subset of $\mathcal{M} \mathcal{E}\left(\mu_{m}, A_{l}, \beta\right)$ such that at most $(m+1)$ of the mixing weights $\left\{\zeta_{k}\right\}_{k=1}^{l}$ are non-zero. Note that a distribution in $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ can be expressed as a convex combination of distributions in $\mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, A_{l}, \beta\right)$ (see, e.g., De Vylder 1996).

For a given function $\phi$ , we consider two approaches to derive bounds and approximations for $E[\phi(S)]$ when $S \in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ (in cases when the expectation exists). The first approach is based on discrete $s$ -convex extremal distributions while the second is based on moment bounds on discrete expected stop-loss transforms.

Remark 5. Naturally, the set $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ tends to $\mathcal{M E}\left(\mu_{m}, \beta\right)$ as $l \rightarrow \infty$ . As such, when $S \in \mathcal{M E}\left(\mu_{m}, \beta\right)$ , bounds for risk measures on $\mathcal{M E}\left(\boldsymbol{\mu}_{m}, \beta\right)$ can be approximated by their counterparts in $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ for $l$ reasonably large.

For a mixed Erlang rv $S$ with df (1), its $j$ -th moment is known to satisfy (6). Using the identity

$\prod_{i=0}^{j-1}(k+i)=\sum_{n=1}^{j} s(j, n) k^{n},$

where the $s(j, n)$ 's are the (signed) Stirling numbers of the first kind (e.g., Abramowitz and Stegun 1972), (6) becomes

$\beta^{j} \mu_{j}=\sum_{n=1}^{j} s(j, n) \kappa_{n},\tag{8}$

for $j=1, \ldots, m$ where $\kappa_{n}=\sum_{k=1}^{l} \zeta_{k} k^{n}$ . In matrix form, we have $\mathbf{M}_{\beta}=\mathbf{s} \mathbf{x}_{m}^{T}$ , where $\mathbf{s}=\{s(j, n)\}_{j, n=1}^{m}$ (with $s(j, n)=0$ for $n>j$ ) and $\boldsymbol{\kappa}_{m}=\left(\kappa_{1}, \ldots, \kappa_{m}\right)$ . Isolating $\boldsymbol{\kappa}_{m}$ yields

$\boldsymbol{\kappa}_{m}=\left(\mathbf{s}^{-1} \mathbf{M}_{\beta}\right)^{T} .\tag{9}$

From Comtet (1974, 213) we know that $\mathbf{s}^{-1} \equiv$ $\mathbf{c}=\left\{(-1)^{i+j} c(i, j)\right\}_{i, j=1}^{m}$ where the $c(i, j)^{\prime}$ s are the Stirling numbers of the second kind defined as $c(i, j)$ $=(j!)^{-1} \sum_{k=0}^{j}(-1)^{j-k}\binom{j}{k} k^{i}$ (e.g., Abramowitz and Stegun 1972).

Thus, for a given $\beta>0$ , the class $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ can be found through the identification of all discrete distributions with support $A_{l}$ and first $m$ moments given by the right-hand side of (9). Equivalently, the class $\mathcal{M} \mathcal{E}^{e x}\left(\mu_{m}, A_{l}, \beta\right)$ can be identified by restricting the discrete distributions on $A_{l}$ to have at most $(m+1)$ non-zero mass points. This argument is formalized in the next section.

4.2. Discrete s-convex extremal distributions

Let $\mathcal{D}\left(\boldsymbol{\alpha}_{m}, A_{l}\right)$ be all discrete distributions with support $A_{l}$ and first $m$ moments $\alpha_{m}=\left(\boldsymbol{\alpha}_{1}, \alpha_{2}, \ldots, \alpha_{m}\right)$ . Also, denote by $\mathcal{D}^{e x t}\left(\boldsymbol{\alpha}_{m}, A_{l}\right)$ the subset of $\mathcal{D}\left(\boldsymbol{\alpha}_{m}, A_{l}\right)$ with distributions having at most $(m+1)$ non-zero mass points. For a given $\beta>0$ , each distribution
in $\mathcal{D}\left(\boldsymbol{\kappa}_{m}, A_{l}\right)$ (and $\mathcal{D}^{\text {ext }}\left(\boldsymbol{\kappa}_{m}, A_{l}\right)$ ) with $\boldsymbol{\kappa}_{m}$ as defined in (9) corresponds to a mixed Erlang distribution in $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ (and $\mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, A_{l}, \beta\right)$ ). This is a one-to-one correspondence (e.g., De Vylder 1996, part 2).

Remark 6. Because of this one-to-one correspondence between $\mathcal{D}\left(\kappa_{m}, A_{l}\right)$ and $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ , conditions under which $\mathcal{M} \mathcal{E}\left(\mu_{m}, A_{l}, \beta\right)$ is not empty can be found from its discrete counterpart $\mathcal{D}\left(\kappa_{m}, A_{l}\right)$ . We refer the reader to, e.g., De Vylder (1996), Marceau (1996), or Courtois and Denuit (2009).

This allows us to make use of the theory developed in Prékopa (1990), Denuit and Lefèvre (1997), Denuit, Lefèvre, and Mesfioui (1999), and Courtois, Denuit, and Van Bellegem (2006) to derive bounds/approximations for $E[\phi(S)]$ when $S \in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ .

First, we briefly recall the definitions of $s$ -convex function and $s$ -convex order introduced by Denuit, Lefèvre, and Shaked (1998).

Definition 7. Denuit, Lefèvre, and Shaked (1998, 2000). Let $\mathcal{C}$ be a subinterval of $\mathbb{R}$ or a subset of $\mathbb{N}$ , and $\phi$ a function on $\mathcal{C}$ . For a positive integer $s$ and $x_{0}$ $<x_{1}<\cdots<x_{s} \in \mathcal{C}$ , we recursively define the divided differences as

$\begin{aligned} {\left[x_{0}, x_{1}, \ldots, x_{k}\right] \phi } & =\frac{\left[x_{1}, x_{2}, \ldots, x_{k}\right] \phi-\left[x_{0}, x_{1}, \ldots, x_{k-1}\right] \phi}{x_{k}-x_{0}} \\ & =\sum_{i=1}^{k} \frac{\phi\left(x_{i}\right)}{\prod_{j=0, j j i}^{k}\left(x_{i}-x_{j}\right)}, \quad k=1,2, \ldots, s, \end{aligned}$

where $\left[x_{k}\right] \phi=\phi\left(x_{k}\right)$ for $k=0,1, \ldots, s$ . The function $\phi$ is $s$ -convex if $\left[x_{0}, x_{1}, \ldots, x_{s}\right] \phi \geq 0$ for all $x_{0}<x_{1}$ $<\cdots<x_{s} \in \mathcal{C}$ .

We mention that the definition of $s$ -convex function, which refers to higher-convexity, should not be confused with the one for Schur-convex function, also known as S-convex function.

Definition 8. Denuit, Lefèvre, and Shaked (2000). For two rv’s $X$ and $Y$ defined on $\mathcal{C}, X$ is smaller than $Y$ in the $s$ -convex sense, namely $X \square_{s-c x}^{\mathcal{C}} Y$ , if $E[\phi(X)]$ $\leq E[\phi(Y)]$ for all $s$ -convex functions $\phi$ (provided the expectation exists).

We mention that the 1 -convex order corresponds to the usual stochastic dominance order, and the 2-convex order is the usual convex order (see Müller and Stoyan (2002), Denuit et al. (2005), and Shaked and Shanthikumar (2007) for a review on stochastic orders). Also, as stated in Theorem 1.6.3 of Müller and Stoyan (2002), the $s$ -convex order can only be used to compare rv’s with the same first $(s-1)$ moments (which explains why $s$ is chosen to be $m+1$ in what follows). Examples of $s$ -convex functions are $\phi(x)=x^{x+j}$ for $j \in \mathbb{N}$ and $\phi(x)=\exp (c x)$ for $c \geq 0$ . For a general treatment of the $s$ -convex order, see, e.g., Denuit, Lefèvre, and Mesfioui (1999) and Denuit, Lefèvre, and Shaked (2000) and section 1.6 of Müller and Stoyan (2002).

Let $K_{(m+1)-\text { min }}$ and $K_{(m+1)-\max }$ be the $(m+1)$ -extremum rv’s on $\mathcal{D}\left(\boldsymbol{\kappa}_{m}, A_{\nu}\right)$ , i.e., those which satisfy

$\begin{gathered} E\left[\phi\left(K_{(m+1)-\min }\right)\right] \leq E[\phi(K)] \\ \quad \leq E\left[\phi\left(K_{(m+1)-\max }\right)\right], \end{gathered}$

for any $(m+1)$ -convex function $\phi$ and $K \in \mathcal{D}\left(\boldsymbol{\kappa}_{m}, A_{l}\right)$ . The general distribution forms of $K_{(m+1)-\text { min }}$ and $K_{(m+1) \text {-max }}$ are given in Prékopa (1990) (see also Courtois, Denuit, and Van Bellegem (2006, Section 4)) and are repeated here:

$\scriptsize{ \begin{array}{|c|c|c|} \hline & m+1 \text { even } & m+1 \text { odd } \\ \hline \text { support of } K_{(m+1)-\min } & \left\{j_1, j_1+1, \ldots, j_{\frac{m+1}{2}}, j_{\frac{m+1}{2}}+1\right\} & \left\{1, j_1, j_1+1, \ldots, j_{\frac{m}{2}}, j_{\frac{m}{2}}+1\right\} \\ \hline \text { support of } K_{(m+1)-\max } & \left\{1, j_1, j_1+1, \ldots, j_{\frac{m-1}{2}}, j_{\frac{m-1}{2}}+1, l\right\} & \left\{j_1, j_1+1, \ldots, j_{\frac{m}{2}}, j_{\frac{m}{2}}+1, l\right\} \\ \hline \end{array} \tag{10}}$

where $1<j_{1}<j_{1}+1<j_{2}<\cdots<l$ . From (10), it is clear that the support of $K_{(m+1)-\min (\max )}$ has at most $m+1$ elements.

Let $W_{K}=\sum_{j=1}^{K} C_{j}$ be a mixed Erlang rv, i.e., $\left\{C_{j}\right\}_{j \geq 1}$ are a sequence of iid exponential rv’s with mean $1 / \beta$ , independent of $K$ . The following result of Denuit, Lefèvre, and Utev (1999, Property 5.7) relates to the stability of the $s$ -convex order under compounding.

Lemma 9 If $K \square_{s-c x}^{\mathrm{A}_{l}} K^{\prime}$ , then $W_{K} \square_{s-c x}^{\mathbb{R}+} W_{K^{\prime}}$ .

We apply Lemma 9 to define the mixed Erlang rv’s $W_{K_{(m+1)-\min }}$ and $W_{K_{(m+1)-\max }}$ , which are the $(m+1)$ extremum rv’s on $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ . It is immediate that, for $W \in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ ,

$W_{K_{(m+1)-\min }} \preceq_{(m+1)-c x}^{\mathbb{R}+} W \preceq \preceq_{(m+1)-c x}^{\mathbb{R}+} W_{K_{(m+1)-\max }} . \tag{11}$

For instance, using (11), the $(m+1)$ -convex functions $\phi(x)=x^{m+1+j}(j \in \mathbb{N})$ and $\phi(x)=\exp (c x)(c \geq 0)$ yield

$E\left[W_{K_{(m+1)-\min }}^{m+1+j}\right] \leq E\left[W^{m+1+j}\right] \leq E\left[W_{K_{(m+1)-\max }}^{m+1+j}\right]$

and

$\begin{gathered} E\left[\exp \left(c W_{K_{(m+1)-\min }}\right)\right] \leq E[\exp (c W)] \\ \leq E\left[\exp \left(c W_{K_{(m+1)-\max }}\right)\right], \end{gathered}$

respectively.

4.3. Moment bounds on discrete expected stop-loss transforms

Extrema on the $(m+1)$ -convex order yield bounds for $E[\phi(W)]$ when $\phi$ is $(m+1)$ -convex and $W \in \mathcal{M E}\left(\mu_{m}, A_{l}\right.$ , $\beta$ ). However, this approach is not appropriate to derive bounds on TVaR and the stop-loss premium when the number of known moments is greater than 2. It is well known that two rv’s with the same mean and variance cannot be compared under the convex order.

Consequently, we use an approach inspired from Courtois and Denuit (2009) (see also Hürlimann 2002) to derive bounds on TVaR and the stop-loss premium. We consider $\mathcal{D}\left(\boldsymbol{\kappa}_{m}, A_{l}\right)$ and determine lower and upper bounds for $E\left[(K-k)_{+}\right]$ for all $k \in A_{l}$ . From the lower
bound, we define the corresponding rv $K_{m-\text { down }}$ on $A_{l}$ via the df

$F_{K_{m-d o w n}}(k)=\left\{\begin{array}{c}1-\binom{\inf _{K \in \mathcal{D}\left(\kappa_{m}, A_{l}\right)} E\left[(K-k)_{+}\right]}{-\inf _{K \in \mathcal{D}\left(\kappa_{m}, A_{l}\right)} E\left[(K-k-1)_{+}\right]}, \\ k=1,2, \ldots, l-1, \\ 1, \\ k=l\end{array}\right.\tag{12}$

for $k \in A_{l}$ . Similarly, $K_{m-u p}$ is defined as in (12) by replacing ‘inf’ by ‘sup’ in the definition. Given that, for $K \in \mathcal{D}\left(\kappa_{m}, A_{l}\right)$ ,

$E\left[\left(K_{m-\text { down }}-k\right)_{+}\right] \leq E\left[(K-k)_{+}\right] \leq E\left[\left(K_{m-\text {-up }}-k\right)_{+}\right],$

for all $k \in A_{l}$ , it implies that $K_{m-d o w n(u p)}$ is smaller (larger) than $K$ under the increasing convex order, namely $K_{m-d o w n} \square_{i c x} K \square_{i c x} K_{m-u p}$ (see, e.g., Courtois and Denuit 2009). Note that $K_{m-\text { down }}$ and $K_{m-u p}$ do not belong to $\mathcal{D}\left(\kappa_{m}, A_{l}\right)$ , but both have first moment $\kappa_{1}$ . The increasing convex order is stable under compounding, and thus

$W_{K_{m-\text { doown }}} \preceq_{i c x} W_{K} \preceq_{i c x} W_{K_{m-u p}} .\tag{13}$

Then, from Denuit et al. (2005, Proposition 3.4.8), it follows that

$\begin{gathered} \operatorname{TVaR}_{\kappa}\left(W_{K_{m-\text { down }}}\right) \leq \operatorname{TVaR}_{\kappa}\left(W_{K}\right) \\ \leq \operatorname{TVaR}_{\kappa}\left(W_{K_{m-\text {-up }}}\right), \end{gathered}\tag{14}$

for $\kappa \in(0,1)$ . Clearly, the rv’s $W_{K_{m-d o w n}}$ and $W_{K_{m-u p}}$ will most likely not belong to $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ .

Remark 10. Note that, when neither of the two aforementioned approaches are applicable, we propose to derive approximate bounds for $E[\phi(S)]$ with $S \in \mathcal{M E}^{\text {ext }}\left(\mu_{m}, A_{l}, \beta\right)$ by calculating $E[\phi(W)]$ for all $W \in \mathcal{M E}^{\text {ext }}\left(\mu_{m}, A_{l}, \beta\right)$ and choosing

$E[\phi(W)]_{\min (\max )}=\inf _{W \in \mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, \boldsymbol{A}_{l}, \beta\right)} E[\phi(W)] .$

Obviously, $E[\phi(S)]$ does not necessarily lie between $E[\phi(W)]_{\min }$ and $E[\phi(W)]_{\max }$ . However, the ‘interval’ estimate $\left[E[\phi(W)]_{\text {min }}, E[\phi(W)]_{\text {max }}\right]$ may give an idea of the variability of all solutions on $\mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right)$ .

4.4. Example: Portfolio of dependent risks

We consider a portfolio of $n$ dependent risks as described in the common mixture model of Cossette, Gaillardetz, and Marceau (2002). Let $S=X_{1}+\cdots+X_{n}$ be the aggregate claim amount with $X_{i}=B_{i} I_{i}$ . Conditional on a common mixture rv $\Theta$ with pmf $a_{j}=\mathbb{P}(\Theta=j)$ for $j=1,2, \ldots$ , $\left\{I_{i}\right\}_{i=1}^{n}$ are assumed to form a sequence of independent Bernoulli rv’s with $\mathbb{P}\left(I_{i}=1 \mid \Theta=j\right)=1-\left(r_{i}\right)^{j}$ for $r_{i} \in$ $(0,1)$ . As for the $B_{i}$ 's, they are assumed to form a sequence of iid exponential rv’s of mean 1 , independent of $\left\{I_{i}\right\}_{i=1}^{20}$ and $\Theta$ .

In this context, it is clear that $S$ is a two-point mixture of a degenerate rv at 0 and a mixed Erlang rv of the form (1) with $l=n$ and $\beta=1$ , i.e., its Laplace transform is given by

$\begin{aligned} E\left[e^{-t S}\right] & \equiv 1-p+p E\left[e^{-t Y}\right] \\ & =\sum_{j=1}^{\infty} a_{j}\left\{\prod_{i=1}^{n}\left(\left(r_{i}\right)^{j}+\left(1-\left(r_{i}\right)^{j}\right) \frac{1}{1+t}\right)\right\}, \quad t \geq 0, \end{aligned}$

where $p=1-\sum_{j=1}^{\infty} a_{j} \prod_{i=1}^{n}\left(r_{i}\right)^{j}$ . We perform the momentbased approximation on the rv $Y=(S \mid S>0)$ rather than $S$ .

For illustrative purposes, we assume $n=20$ and a logarithmic pmf for $\Theta$ , namely $a_{j}=(0.5)^{j} /(j \ln 2)$ for $j \geq 1$ . Also, the constants $r_{i}$ are set such that the (unconditional) mean of $I_{i}$ is $q_{i}=1-E\left[\left(r_{i}\right)^{\ominus}\right]$ with $q_{1}=\cdots$ $=q_{10}=0.1$ and $q_{11}=\cdots=q_{20}=0.02$ . Under the above assumptions, the first five moments of $Y$ are found to be $\mu_{5}=(1.7999,6.2270,31.4785,208.1258,1693.7077)$ .

Using the approach on discrete $(m+1)$ -convex extremal distributions, the dfs $F_{W_{\kappa_{(m+1)-\text { min }}}}$ and $F_{W_{\kappa_{(m+1)-\text { max }}}}$ for $m=4,5$ are:

$\begin{aligned} F_{W_{K_{5-\min }}}(x)= & 0.5365 H(x ; 1,1) \\ & +0.2127 H(x ; 2,1) \\ & +0.2232 H(x ; 3,1) \\ & +0.0245 H(x ; 6,1) \\ & +0.0030 H(x ; 7,1), \\ F_{W_{K_{5-\max }}}(x)= & 0.4860 H(x ; 1,1) \\ & +0.3981 H(x ; 2,1) \\ & +0.0621 H(x ; 4,1) \\ & +0.0537 H(x ; 5,1) \\ & +0.0000 H(x ; 20,1), \\ F_{W_{K_{6-\min }}}(x)= & 0.5126 H(x ; 1,1) \\ & +0.3179 H(x ; 2,1) \\ & +0.0531 H(x ; 3,1) \\ & +0.1082 H(x ; 4,1) \\ & +0.0059 H(x ; 7,1) \\ & +0.0023 H(x ; 8,1), \end{aligned}$

and

$\begin{aligned} F_{W_{K_{6-\max }}}(x)= & 0.5322 H(x ; 1,1) \\ & +0.2264 H(x ; 2,1) \\ & +0.2110 H(x ; 3,1) \\ & +0.0004 H(x ; 5,1) \\ & +0.0299 H(x ; 6,1) \\ & +0.0000 H(x ; 20,1) . \end{aligned}$

Let $X_{K_{(n+1)} \text {-inimman }}$ be arv with df $F_{X_{K_{(m+1) \text { minimax }}}}$ $(x)=1-p$ $+p F_{W_{K_{(m+1)-\text { nimpax }}}}(x)$ for $x \geq 0$ . It follows from Section 4.2 that lower (upper) bounds for the higher-order moments $E\left[S^{j}\right](j=4,5, \ldots)$ and the exponential premium principle defined as $\varphi_{\eta}(S)=\left(\ln E\left[e^{n s}\right]\right) / \eta$ ( $\eta>0$ ) can be found from their counterparts for $X_{K_{\text {ctr }}}$ . A few numerical values are provided in Tables 5 and 6, respectively.

As expected, the bounds get sharper as the number of moments involved increases.

Table 5.Higher-order moments of

$S, X_{\kappa_{(m+1)-\min }}$ and

$X_{\kappa_{(m+1)-\max }}(m=4,5)$

$j$	$E\left[X_{k_{5-\min }^j}^j\right]$	$E\left[X_{K_{6-\text { min }}}^j\right]$	$E\left[S^i\right]$	$E\left[X_{k_{6-\max }^j}^j\right]$	$E\left[X_{k_{5-\max }^j}^j\right]$
4	138.7579	138.7579	138.7579	138.7579	138.7579
5	1125.9592	1129.1880	1129.1880	1129.1880	1149.9348
6	10748.5738	10873.8020	10881.2732	10922.7337	11993.6176

Table 6.Exponential premiums of

$S, X_{K_{(m+1)-\min }}$ and

$X_{K_{(m+1)-\max }}(m=4,5)$

$\eta$	$\varphi_\eta\left(X_{K 5-\min }\right)$	$\varphi_\eta\left(X_{K 6-\min }\right)$	$\varphi_\eta(S)$	$\varphi_\eta\left(X_{K_{6-\max }}\right)$	$\varphi_\eta\left(X_{K_{5-\max }}\right)$
0.2	1.5545	1.5546	1.5546	1.5548	1.5564
0.1	1.3536	1.3536	1.3536	1.3536	1.3536
0.01	1.2137	1.2137	1.2137	1.2137	1.2137

As for the second approach based on moment bounds with discrete expected stop-loss transforms, Table 7 presents the values of TVaR for $X_{K_{m-down(up)}} (m=4,5)$ with df $F_{X_{K_{m-down(up)}}}(x)=1-p+p F_{W_{K_{m-down(up)}}}(x)$ for $x \geq 0$ .

Table 7.Values of TVaR for

$S, X_{\kappa_{m-\text { down }}}$ and

$X_{K_{m-u p}}(m=4,5)$

$\kappa$	Exact	4 moments		5 moments
$\kappa$	$\operatorname{TVaR}_x(S)$	$\mathrm{TVaR}_{\mathrm{k}}\left(X_{K_{4-\text { down }}}\right)$	$\operatorname{TVaR}_\kappa\left(X_{\text {к }_{4-\text { up }}}\right)$	$\mathrm{TVaR}_{\mathrm{k}}\left(X_{K_{5-\text { down }}}\right)$	$\operatorname{TVaR}_\kappa\left(X_{\text {к }_{5-\text { up }}}\right)$
0.9	5.0696	4.9222	5.2062	4.9800	5.1490
0.95	6.2214	5.9708	6.4548	6.0594	6.3642
0.99	8.8460	8.2899	9.3301	8.4655	9.1767
0.995	9.9589	9.2500	10.5629	9.4679	10.3775
0.999	12.5066	11.4122	13.4854	11.7323	13.1382

As expected, the inequality (14) is verified. Also, we observe that the interval estimate of $\mathrm{TVaR}_{\kappa}(S)$ shrinks as the number of moments matched increases.

Finally, given that neither method is applicable for the VaR risk measure, we make use of the technique discussed in Remark 10. We find the numerical values provided in Table 8.

Table 8.Minimal/Maximal values of VaR within

$\mathcal{M} \mathcal{E}^{\operatorname{ext}}\left(\mu_m, A_{20}, 1\right)(m=4,5)$ vs

$\operatorname{VaR}_{\mathrm{k}}(\mathrm{S})$

$\kappa$	Exact	$\mathcal{M} \mathcal{E}^{\operatorname{ext}}\left(\pmb{\mu}_4, A_{20}, 1\right)$		$\mathcal{M} \mathcal{E}^{\operatorname{ext}}\left(\pmb{\mu}_5, A_{20}, 1\right)$
$\kappa$	$\operatorname{VaR}_\kappa(S)$	$\operatorname{VaR}_{\mathrm{x}}(X)_{\min }$	$\operatorname{VaR}_{\mathrm{x}}(X)_{\max }$	$\operatorname{VaR}_{\mathrm{x}}(X)_{\min }$	$\operatorname{VaR}_{\mathrm{x}}(X)_{\max }$
0.9	3.3965	3.3896	3.4031	3.3897	3.4001
0.95	4.5704	4.5539	4.6030	4.5584	4.5730
0.99	7.2334	7.2182	7.2690	7.2251	7.2533
0.995	8.3604	8.3143	8.3946	8.3539	8.3925
0.999	10.9388	10.7191	10.9781	10.9226	10.9536

For this example, we observe that the exact values of $\operatorname{VaR}_{\kappa}(S)$ are within the minimal and maximal values of their corresponding risk measures among all members of $M E^{e x t}\left(\mu_{m}, A_{20}, 1\right)$ for $m=4,5$ .

Also, the spread between the minimal and maximal values of VaR are reduced when we go from 4 to 5 moments. An identical exercise for the TVaR risk measure resulted in the same conclusions.

Moment-Based Approximation with Mixed Erlang Distributions

Abstract

1. Introduction

2. Background

2.1. Admissible moments

2.2. Mixed Erlang distribution

2.3. Method of Johnson and Taaffe (1989)

3. Moment-based approximation with mixed Erlang distribution

3.1. Description of the approach

3.2. Numerical examples

3.2.1. Lognormal distribution: Dufresne (2007, Example 5.4)

3.2.2. Mixture of two gamma distributions: S. C. Lee and Lin (2010, Section 5, Example 1)

3.2.3. Gompertz distribution

3.2.4. Some remarks

4. Moment-based mixed Erlang approximation with known $\beta$

4.1. Basic definitions

4.2. Discrete s-convex extremal distributions

4.3. Moment bounds on discrete expected stop-loss transforms

4.4. Example: Portfolio of dependent risks

References

Moment-Based Approximation with Mixed Erlang Distributions

Abstract

1. Introduction

2. Background

2.1. Admissible moments

2.2. Mixed Erlang distribution

2.3. Method of Johnson and Taaffe (1989)

3. Moment-based approximation with mixed Erlang distribution

3.1. Description of the approach

3.2. Numerical examples

3.2.1. Lognormal distribution: Dufresne (2007, Example 5.4)

3.2.2. Mixture of two gamma distributions: S. C. Lee and Lin (2010, Section 5, Example 1)

3.2.3. Gompertz distribution

3.2.4. Some remarks

4. Moment-based mixed Erlang approximation with known \beta\beta

4.1. Basic definitions

4.2. Discrete s-convex extremal distributions

4.3. Moment bounds on discrete expected stop-loss transforms

4.4. Example: Portfolio of dependent risks

References

This website uses cookies

4. Moment-based mixed Erlang approximation with known $\beta$