Processing math: 10%
Skip to main content
Variance
  • Menu
  • Articles
    • Actuarial
    • Capital Management
    • Claim Management
    • Data Management and Information
    • Financial and Statistical Methods
    • Other
    • Ratemaking and Product Information
    • Reserving
    • Risk Management
    • All
  • For Authors
  • Editorial Board
  • About
  • Issues
  • Archives
  • Variance Prize
  • search

RSS Feed

Enter the URL below into your favorite RSS reader.

http://localhost:17716/feed
Actuarial
Vol. 10, Issue 1, 2017January 01, 2017 EDT

Moment-Based Approximation with Mixed Erlang Distributions

Hlne Cossette, David Landriault, Etienne Marceau, Khouzeima Moutanabbir,
Risk theorymixed Erlang distributionsmoment-matchingdistribution fittingphase-type approximation
Variance
Cossette, Hlne, David Landriault, Etienne Marceau, and Khouzeima Moutanabbir. 2017. “Moment-Based Approximation with Mixed Erlang Distributions.” Variance 10 (1): 166–82.
Save article as...▾
Download all (3)
  • Figure 1. Density function: Lognormal vs. Approximations.
    Download
  • Figure 2. Density (left) and df (right): 3- and 4-moment approximations vs. Gompertz distribution.
    Download
  • Figure 3. Density (left) and df (right): 5-moment approximation vs Gompertz distribution.
    Download

Sorry, something went wrong. Please try again.

If this problem reoccurs, please contact Scholastica Support

Error message:

undefined

View more stats

Abstract

Moment-based approximations have been extensively analyzed over the years (see, e.g., Osogami and Harchol-Balter 2006 and references therein). A number of specific phase-type (and non phase-type) distributions have been considered to tackle the moment-matching problem (see, for instance, Johnson and Taaffe 1989). Motivated by the development of more flexible moment-based approximation methods, we develop and examine the use of finite mixture of Erlangs with a common rate parameter for the moment-matching problem. This is primarily motivated by Tijms (1994) who shows that this class of distributions can approximate any continuous positive distribution to an arbitrary level of accu-racy, as well as the tractability of this class of distributions for various problems of interest in quantitative risk management. We consider separately situations where the rate parameter is either known or unknown. For the former case, a direct connection with a discrete moment-matching problem is established. A parallel to the s-convex stochastic order (e.g., Denuit et al. 1998) is also drawn. Numerical examples are considered throughout.

1. Introduction

Mixed Erlang distributions are known to yield analytic solutions to many risk management problems of interest. This is primarily due to the tractable features of this distributional class. Among others, the class of mixed Erlang distributions is closed under various operations such as convolutions and Esscher transformations (e.g., Willmot and Woo 2007 and Willmot and Lin 2011). As such, risk aggregation and ruin problems can more easily be tackled under mixed Erlang assumptions (e.g., Cheung and Woo 2016; Cossette, Mailhot, and Marceau 2012, and Landriault and Willmot 2009). Also, Tijms (1994) showed that the class of mixed Erlang distributions is dense in the set of all continuous and positive distributions. Therefore, we consider a moment-based approximation method which capitalizes on the aforementioned properties of the mixed Erlang distribution. More precisely, we propose to approximate a distribution with known moments by a moment-matching mixed Erlang distribution. Moment-based approximations have been extensively developed in various research areas, including performance evaluation, queueing theory, and risk theory, to name a few.

Osogami and Harchol-Balter (2006) identify the following four criteria to evaluate moment-matching algorithms: (1) the number of moments matched; (2) the computational efficiency of the algorithm; (3) the generality of the solution; and (4) the minimality of the number of parameters (phases). It also seems desirable for the approximation to be in itself a distribution. This is not mentioned in Osogami and Harchol-Balter (2006) for the obvious reason that they consider phase-type distributions as their moment-based approximation class. There exists an extensive literature on the approximation of distributions by a specific subset of phase-type distributions using moment-based techniques. For instance, Whitt (1982) proposed a mixture of two exponential distributions or a generalized Erlang distribution as a moment-based approximation when either the coefficient of variation (CV) is greater than or less than 1 , respectively. Also, both Altiok (1985) and Vanden Bosch, Dietz, and Pohl (2000) proposed an alternative to the
moment-based approximation of Whitt (1982) when CV>1 using a Coxian distribution. Alternatively, Johnson and Taaffe (1989) considered a mixture of Erlangs with a common shape (order) parameter as their moment-based approximation.

Most predominantly, there exists a substantial body of literature on the three-moment approximation within the phase-type class of distributions (e.g., Telek and Heindl 2002; Bobbio, Horváth, and Telek 2005, and references therein). Matching the first three moments is often viewed as effective to provide a reasonable approximation to the underlying system (e.g., Osogami and Harchol-Balter 2006 and references therein). However, as illustrated in this paper and many others, three moments does not always suffice, triggering the development of more flexible moment-based approximations. Among others, we mention the work of Johnson and Taaffe (1989) on mixed Erlang distributions of common order. Also, Dufresne (2007) proposes two approximation techniques based on Jacobi polynomial expansions and the logbeta distribution to fit combinations of exponential distributions. This paper is complementary to the aforementioned ones by considering the family of finite mixture of Erlangs with common rate parameter to approximate a distribution on R+, as theoretically justified in the continuous case by Tijms (1994, Theorem 3.9.1). The reader is also referred to S. C. Lee and Lin (2010) where fitting of the same class of distributions is considered using the EM algorithm (which relies on the knowledge of the approximated distribution rather than only its moments).

It is worth pointing out that other non-phase type approximation methods have been widely used in actuarial science. A good survey paper on this topic is Chaubey, Garrido, and Trudeau (1998). One of these approximation classes are refinements to the normal approximation such as the normal power and the Cornish Fisher approximations (e.g., Ramsay 1991; Daykin, Pentikäinen, and Pesonen 1994, and Y. S. Lee and Lin 1992). These approximations are based on the first few moments. However, the resulting approximation is often not a proper distribution. Other moment-based distributional approximations are the translated gamma distribution (e.g., Seal 1977), translated inverse Gaussian distribution (e.g., Chaubey, Garrido, and Trudeau 1998) and the generalized Pareto distribution (e.g., Venter 1983). It should be noted that all these approximation methods are designed to fit a specific number of moments and thus lack the flexibility to match an arbitrary number of moments.

The rest of the paper is constructed as follows. In Section 2, a brief review on admissible moments, mixed Erlang distributions and the approximation method of Johnson and Taaffe (1989) is provided. Section 3 is devoted to our class of finite mixture of Erlangs with common rate parameter. Theoretical and practical considerations related to the approximation method are drawn. Various examples are considered to examine the quality of the resulting approximation. In Section 4, we consider applications of our momentbased approximations of Section 3 when the underlying distribution is of mixed Erlang form with known rate parameter. A parallel is drawn with a discrete moment-matching problem and certain stochastic orderings, notably the s-convex stochastic order (e.g., Denuit, Lefèvre, and Shaked 1998). An application of Cossette, Gaillardetz, and Marceau (2002) will be examined in more detail.

2. Background

2.1. Admissible moments

Karlin and Studden (1966) provide the necessary and sufficient conditions for a set of (raw) moments μm=(μ1,…,μm) to be from a probability distribution defined on R+. To state this result, define the matrices Pk and Qk(k≥1) as

Pk=(1μ1⋯μkμ1μ2⋯μk+1⋮⋮⋱⋮μkμk+1⋯μ2k);Qk=(μ1μ2⋯μk+1μ2μ3⋯μk+2⋮⋮⋱⋮μk+1μk+2⋯μ2k+1).

As stated in Courtois and Denuit (2007), there exists a non-negative random variable (rv) with distribution function (df) F and first m moments μm if and only if the following two conditions are satisfied:

  • detPk>0, for k=1,…,⌊(m−1)/2⌋;
  • detQk>0, for k=1,…,⌊m/2⌋;

where ⌊x⌋ holds for the integer part of x. In what follows, we silently assume the moment set μm is from a probability distribution on R+.

2.2. Mixed Erlang distribution

We now review some known properties of mixed Erlang distributions with common rate parameter. A more elaborate review of this class of distributions can be found in Willmot and Woo (2007), S. C. Lee and Lin (2010), and Willmot and Lin (2011).

Let W be a mixed Erlang rv with common rate parameter β>0 and df

FW(x)=∑k∈AlζkH(x;k,β),

where Al={1,2,…,l}, and {ζk}lk=1 is the probability mass function (pmf) of a discrete rv K with support Al for a given l∈{1,2,…}∪{∞}. The Erlang df H is defined as

H(x;k,β)≡1−ˉH(x;k,β)=1−e−βxk−1∑i=0(βx)ii!,x≥0,

where the parameters k and β of the Erlang df are known as the shape and rate parameters, respectively. An alternative and useful representation of the mixed Erlang rv W is W=∑Kk=1Ck where {Ck}k≥1 are iid exponential rv’s with mean 1/β, independent of K, i.e., the rv W follows a compound distribution.

Remark 1. As in, e.g., Willmot and Woo (2007), we consider the class of mixed Erlang dfs (1) rather than the more general class of combinations of Erlangs where some ζk 's are possibly negative. For the latter class, additional constraints on {ζk}lk=1 exist to ensure that the right-hand side of (1) is a non-decreasing function in x. This presents additional challenges in the subsequent moment-matching application, challenges which do not arise in the mixed Erlang case.

It is well known that the j-th moment of W is given by E[Wj]=β−j∑∞k=1ζk{∏j−1i=0(k+i)}. Of particular importance in actuarial science and quantitative risk management (see, e.g., McNeil, Frey, and Embrechts 2005 and references therein) are the VaR and TVaR risk measures. For the mixed Erlang rv W, there is in general no closed form expression for VaRk(W)=inf{x∈R:FW(x)≥κ} where 0≤κ<1, but its value can be obtained using a routine numerical procedure. As for its TVaR, S. C. Lee and Lin (2010) showed that

TVaRκ(W)≡11−κ∫1κVaRu(W)du=11−κ∞∑k=1ζkkβˉH(VaRκ(W);k+1,β).

Another quantity of interest is the stop-loss premium defined as πW(b)=E[(W−b)+]with (x)+=max{x,0}. For the mixed Erlang df (1), we have

πW(b)=11−κ∞∑k=1ζk(kβˉH(b;k+1,β)−bˉH(b;k,β)),b≥0

(see also Willmot and Woo (2007, Eq. 3.6) for the higher-order stop-loss moments). Tijms (1994) showed that this class of distributions can approximate any continuous positive distribution with an arbitrary level of accuracy. For completeness, the theoretical foundation of this result is given next.

Theorem 2. (Tijms 1994, Theorem 3.9.1). Let F be the df of a positive rv. For any given h>0, define

Fh(x)=∞∑k=1(F(kh)−F((k−1)h))H(x;k,1h).

Then, limh→0Fh(x)=F(x) for any continuity point x of F.
Note that Fh in (4) is a mixed Erlang df of the form (1) with ζk=(F(kh)−F((k−1)h))(k=1,2,…) and rate parameter \beta=1 / \mathrm{h}.

Several approximation methods motivated by Tijms’ theorem were proposed over the years (see Section 1 for more details). In general, these moment-based approximations propose to work with a specific subclass of all finite and infinite mixed Erlang distributions. Among them, we recall the method of Johnson and Taaffe (1989), who will be used later for comparative purposes.

2.3. Method of Johnson and Taaffe (1989)

Johnson and Taaffe (1989) investigated the use of mixtures of Erlang distributions with common shape parameter for moment-matching purposes. More precisely, mixtures of n (or fewer) Erlangs with common shape parameters are used to match the first (2 n-1) moments (whenever the set of moments is within the feasible set). For the three-moment matching problem, Johnson and Taaffe (1989) generalized the approximation of Whitt (1982) and Altiok (1985) by enlarging the set of feasible moments \mu_{3} when C V>1. Their method is also valid for some combinations of \mu_{3} when C V<1.

Their three-moment approximation is a mixture of two Erlangs with common shape parameter r (see Theorem 3 of Johnson and Taaffe 1989), i.e.,

F(x)=p H\left(x ; r, \beta_{1}\right)+(1-p) H\left(x ; r, \beta_{2}\right),\tag{5}

where p=\left(\frac{\mu_{1}}{r}-\beta_{2}^{-1}\right) /\left(\beta_{1}^{-1}-\beta_{2}^{-1}\right), and \left\{1 / \beta_{i}\right\}_{i=1}^{2} are the solutions of A s^{2}+B s+C=0 with

\begin{aligned} & A=r(r+2) \mu_{1}\left(\mu_{2}-\frac{r+1}{r} \mu_{1}^{2}\right) \\ & B=-\left(\begin{array}{c} r\left(\mu_{1} \mu_{3}-\frac{r+1}{r+2} \mu_{2}^{2}\right) \\ +\frac{r(r+2)}{r+1}\left(\mu_{2}-\frac{r+1}{r} \mu_{1}^{2}\right)^{2} \\ +(n+2) \mu_{1}^{2}\left(\mu_{2}-\frac{r+1}{r} \mu_{1}^{2}\right) \end{array}\right) \\ & C=\mu_{1}\left(\mu_{1} \mu_{3}-\frac{r+1}{r+2} \mu_{2}^{2}\right) . \end{aligned}

The choice of the shape parameter r is discussed in Johnson and Taaffe (1989, Proposition 4).

3. Moment-based approximation with mixed Erlang distribution

In this section, we propose to use a different subclass of mixed Erlang distributions to examine momentbased approximation techniques.

3.1. Description of the approach

For a given l \in \mathbb{N}^{+}, let \mathcal{M E}\left(\boldsymbol{\mu}_{m}, A_{l}\right) be the set of all finite mixture of Erlangs with df (1) and first m moments \mu_{m}. From Section 2.2, this consists in the identification of all solutions to the problem

\sum_{k=1}^{l} \zeta_{k} \frac{\prod_{i=0}^{j-1}(k+i)}{\beta^{j}}=\mu_{j}, \quad j=1, \ldots, m,\tag{6}

under the constraints that \beta>0 and \left\{\zeta_{k}\right\}_{k=1}^{l} is a probability measure on A_{l}.

Remark 3. For a r v W with d f (1) and first m moments \mu_{m}, we indifferently write W \in \mathcal{M} \mathcal{E}\left(\mu_{m}, A_{l}\right) or F_{W} \in \mathcal{M E}\left(\mu_{m}, A_{l}\right). This will also apply to the other distributional classes.

Also, let \mathcal{M E}^{\text {res }}\left(\mu_{m}, A_{l}\right) be the (restricted) subset of \mathcal{M E}\left(\boldsymbol{\mu}_{m}, A_{l}\right) with at most m non-zero mixing probabilities \left\{\zeta_{k}\right\}_{k=1}^{l}. Given that \mathcal{M E} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l}\right) has a finite number of solutions, we propose to use \mathcal{M E}{\mathstrut}^{\text {res }}\left(\mu_{m}, A_{l}\right) as our approximation class. It is clear that \mathcal{M E} \mathcal{E}^{\text {res }}\left(\mu_{m}, A_{l}\right) \subseteq \mathcal{M} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l^{\prime}}\right) for l \leq l^{\prime}.

Note that for a continuous positive distribution with moments \mu_{m}, we know from Theorem 2 that there exists a l large enough such that \mathcal{M E}\left(\mu_{m}, A_{l}\right) is not empty. Even though no formal conclusion can be reached for the restricted class \mathcal{M E} \mathcal{E}^{\text {res }}\left(\mu_{m}, A_{l}\right), all our numerical studies have shown that this set has a large number of distributions (see, for instance, the examples of subsections 3.2.1 and 3.2.2) for a given m when l is chosen large enough.

Distributions in the \mathcal{M} \mathcal{E}^{\text {res }}\left(\mu_{m}, A_{l}\right) class are identified as follows: for a given set \left\{i_{k}\right\}_{k=1}^{m} \subset A_{l} with 1 \leq i_{1}< i_{2}<\cdots<i_{m} \leq l, (6) can be rewritten in matrix form as

\mathbf{G}_{m} \zeta_{m}=\mathbf{M}_{\beta},

where \zeta_{m}=\left(\zeta_{i_{1}}, \zeta_{i_{2}}, \ldots, \zeta_{i_{m}}\right)^{T}, \mathbf{M}_{\beta}=\left(\beta \mu_{1}, \beta^{2} \mu_{2}, \ldots\right., \left.\beta^{m} \mu_{m}\right)^{T}, and

\mathbf{G}_{m}=\left(\begin{array}{cccc} i_{1} & i_{2} & \cdots & i_{m} \\ i_{1}\left(i_{1}+1\right) & i_{2}\left(i_{2}+1\right) & \cdots & i_{m}\left(i_{m}+1\right) \\ \vdots & \vdots & \ddots & \vdots \\ \prod_{i=0}^{m-1}\left(i_{1}+i\right) & \prod_{i=0}^{m-1}\left(i_{2}+i\right) & \cdots & \prod_{i=0}^{m-1}\left(i_{m}+i\right) \end{array}\right) .

It follows that \zeta_{m}=\mathbf{G}_{m}^{-1} \mathbf{M}_{\beta} under the constraint that \zeta_{m} \mathbf{e}=1, where \mathbf{e} is the vector 1’s. Note that \zeta_{m}^{T} \mathbf{e}=\left(\mathbf{G}_{m}^{-1} \mathbf{M}_{\beta}\right)^{T} \mathbf{e} is a polynomial of degree (at most) m in \beta. Thus, we only consider the real and positive solutions (in \beta ) of \left(\mathbf{G}_{m}^{-1} \mathbf{M}_{\beta}\right)^{T} \mathbf{e}=1 and complete their mixed Erlang representation with the identification of the mixing weights \zeta_{m}=\mathbf{G}_{m}^{-1} \mathbf{M}_{\beta}. The procedure is systematically repeated for all \binom{l}{m} possible sets of m distinct elements in A_{l}.

Remark 4. Given that the above procedure is repeated \binom{l}{m} times, the computational efficiency of the proposed methodology is mostly driven by this number, and hence the parameters m and l should be chosen accordingly. For a given number of moments, m, we observe that: (a) larger values of l result in a more time-consuming numerical procedure; (b) however, l should be chosen large enough for the approximation class \mathcal{M} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l}\right) to have a reasonable number of members (to legitimally produce a “good” approximation). From our numerical studies, we observe that the selection of l (for a given m ) can be problemspecific, and thus this tradeoff in the choice of l should be handled with care. However, as a rule of thumb, when m is relatively small (i.e., m \leq 6 ) which is traditionally in moment-matching exercises, a value of l between 50 and 100 leads to reasonable mixed Erlang approximations. We refer the reader to the numerical illustrations and subsequent remarks in Section 3.2 for a more detailed discussion on this topic.

Among all \mathcal{M} \mathcal{E}^{r e s}\left(\boldsymbol{\mu}_{m}, A_{l}\right) distributions, we propose to use as a criterion of the quality of these approximations the Kolmogorov-Smirnov (KS) distance defined for two rv’s S and W (with respective dfs F_{S} and F_{W} )
as d_{K S}(S, W)=\sup _{x \geq 0}\left|F_{S}(x)-F_{W}(x)\right|. The KS distance is commonly used in the context of continuous distributions (e.g., Denuit et al. 2005). Therefore, the chosen mixed Erlang approximation within \mathcal{M E} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l}\right) is the one minimizing the KS distance with the true df F_{S}. We denote by F_{W_{m, 1}} this approximation, i.e.,

d_{K S}\left(S, W_{m, l}\right)=\inf _{F_{W} \in \mathcal{M} \mathcal{E}^{\prime \prime s}\left(\mu_{m}, A_{l}\right)} \sup _{x \geq 0}\left|F_{S}(x)-F_{W}(x)\right|,

where W_{m, l} is a rv with df F_{W_{m, l}}. This requires the calculation of the KS distance for each mixed Erlang distribution in \mathcal{M} \mathcal{E}^{r e s}\left(\mu_{m}, A_{l}\right) to identify its minimizer W_{m, l}. In general, an explicit expression for this KS distance does not exist, and hence we propose to numerically find this value by evaluating the distance between the two dfs over all multiples (up to a given high value) of a small discretization span.

Note that other distances such as the stop-loss distance (e.g., Gerber 1979) could have been used to select our approximation distribution. Alternatively, one could have relied on another criterion to identify this approximation distribution (for instance, select the distribution in \mathcal{M E}{\mathstrut}^{r e s}\left(\mu_{m}, A_{l}\right) with the closest subsequent moment to the true distribution).

3.2. Numerical examples

We consider a few simple examples to illustrate the quality of the approximation. For comparative purposes, other approximation methods will also be discussed. Some concluding remarks on the mixed Erlang approximation method are later made based on the numerical experiment conducted next.

3.2.1. Lognormal distribution: Dufresne (2007, Example 5.4)

Let S=\exp (Z) where Z is a normal rv with mean 0 and variance 0.25 . The first 5 moments of S are \mu_{5}=(1.1331,1.6487,3.0802,7.3891,22.7599). We consider the class of mixed Erlang distributions \mathcal{M E} \mathcal{E}^{\text {res }}\left(\mu_{m}\right., \left.A_{70}\right)(m=3,4,5) which have a total of 13198,89294 and 290422 distributions, respectively. The resulting mixed Erlang approximations F_{W_{m, 70}}(m=3,4,5) are

\begin{aligned} F_{W_{3,70}}(x)= & 0.8209 H(x ; 6,6.3219) \\ & +0.1727 H(x ; 12,6.3219) \\ & +0.0064 H(x ; 26,6.3219), \\ F_{W_{4}, 70}(x)= & 0.6350 H(x ; 7,8.3334) \\ & +0.2950 H(x ; 12,8.3334) \\ & +0.0672 H(x ; 20,8.3334) \\ & +0.0029 H(x ; 40,8.3334), \end{aligned}

and

\begin{aligned} F_{W_{5,70}}(x)= & 0.6273 H(x ; 7,8.3608) \\ & +0.3063 H(x ; 12,8.3608) \\ & +0.0609 H(x ; 20,8.3608) \\ & +0.0055 H(x ; 34,8.3608) \\ & +0.0001 H(x ; 69,8.3608), \end{aligned}

for x \geq 0 with respective KS distances of d_{K S}\left(S, W_{3,70}\right) =0.0040, d_{K S}\left(S, W_{4,70}\right)=0.0018 and d_{K S}\left(S, W_{5,70}\right)= 0.0011 . Note that the quality of the mixed Erlang approximation (as measured by the KS distance) increases with the number of moments matched. For comparative purposes, the three-moment approximation (5) of Johnson and Taaffe (1989) is given by

\begin{aligned} F_{W_{J T}}(x)= & 0.0087 H(x ; 4,1.2804) \\ & +0.9913 H(x ; 4,3.5855), \end{aligned}

In Figure 1, we compare the density function of W_{m, 70} (m=3,4,5) W_{J T} and S.

Figure 1
Figure 1.Density function: Lognormal vs. Approximations.

All three mixed Erlang approximations provide an overall good fit to the exact distribution. To further examine the tail fit, specific values of VaR and TVaR for the exact and approximated distributions are provided in Tables 1 and 2, respectively.

Table 1.Values of \operatorname{VaR}_{\mathrm{x}} for W_\pi, W_{m, 70}(m=3,4,5) and S
\kappa \operatorname{VaR}_\kappa\left(W_{J T}\right) \operatorname{VaR}_{\mathrm{k}}\left(W_{3,70}\right) \operatorname{VaR}_{\mathrm{k}}\left(W_{4,70}\right) \operatorname{VaR}_{\mathrm{k}}\left(W_{5,70}\right) \operatorname{VaR}_\kappa(S)
0.9 1.8906 1.9129 1.9056 1.8936 1.8980
0.95 2.2119 2.2692 2.2918 2.2791 2.2760
0.99 2.9953 3.1223 3.0991 3.1812 3.2001
0.995 3.4239 3.6892 3.4811 3.6572 3.6252
0.999 5.0672 4.9237 5.0623 4.7241 4.6885
Table 2.Values of \mathrm{TVaR}_{\mathrm{k}} for W_\pi, W_{m, 70}(m=3,4,5) and S
\kappa \operatorname{TVaR}_{\mathrm{k}}\left(W_\pi\right) \operatorname{TVaR}_{\mathrm{k}}\left(W_{3,70}\right) \operatorname{TVaR}_{\mathrm{k}}\left(W_{4,70}\right) \operatorname{TVaR}_{\mathrm{k}}\left(W_{5,70}\right) \operatorname{TVaR}_{\mathrm{k}}(S)
0.9 2.3931 2.4540 2.4601 2.4616 2.4616
0.95 2.7528 2.8350 2.8431 2.8600 2.8586
0.99 3.7939 3.9007 3.8125 3.8579 3.8413
0.995 4.4115 4.4455 4.3654 4.3323 4.2957
0.999 6.2260 5.4245 5.6195 5.3151 5.4341

We observe that the VaR and TVaR values of the mixed Erlang approximations compare very well to their lognormal counterparts, especially for the 5-moment approximation. This is particularly true given that the lognormal distribution is known to have a heavier tail than the mixed Erlang distribution.

Note that the improvement is indeed not monotone with the number of moments matched, as increasing this number does not necessarily lead to a higher quality approximation in moment-matching techniques.

3.2.2. Mixture of two gamma distributions: S. C. Lee and Lin (2010, Section 5, Example 1)

Let S be a mixture of two gamma distributions with density

\begin{aligned} f_{s}(s)= & 0.2 \frac{(3.2 s)^{2.6} e^{-3.2 s}}{s \Gamma(2.6)} \\ & +0.8 \frac{(1.2 s)^{6.3} e^{-1.2 s}}{s \Gamma(6.3)}, \quad s \geq 0, \end{aligned}

and first 6 moments \mu_{6}=(4.3623,25.7308,176.9624, 1369.8272, 11754.2149, 110674.4154). We consider the class of mixed Erlang distributions \mathcal{M} \mathcal{E}^{r e s}\left(\mu_{m}, A_{70}\right)

for m=3,4,5,6 which are composed of 16000,83797 , 494532 and 1928919 distributions, respectively. The resulting mixed Erlang approximations F_{W_{m, 70}}(m=3,4, 5,6) are

\begin{aligned} F_{W_{3,70}}(x)= & 0.2140 H(x ; 2,2.0835) \\ & +0.5215 H(x ; 9,2.0835) \\ & +0.2645 H(x ; 15,2.0835), \\ F_{W_{4,70}}(x)= & 0.2266 H(x ; 2,2.0469) \\ & +0.4440 H(x ; 9,2.0469) \\ & +0.2963 H(x ; 13,2.0469) \\ & +0.0330 H(x ; 19,2.0469), \\ F_{W_{5,70}}(x)= & 0.2023 H(x ; 3,3.7271) \\ & +0.2091 H(x ; 12,3.7271) \\ & +0.3936 H(x ; 19,3.7271) \\ & +0.1805 H(x ; 28,3.7271) \\ & +0.0145 H(x ; 42,3.7271), \end{aligned}

and

\begin{aligned} F_{W_{6,70}}(x)&= 0.0768 H(x ; 2,3.0731) \\ &\quad +0.1325 H(x ; 3,3.0731) \\ &\quad +0.2739 H(x ; 11,3.0731) \\ &\quad +0.3928 H(x ; 17,3.0731) \\ &\quad +0.1188 H(x ; 25,3.0731) \\ &\quad +0.0052 H(x ; 37,3.0731), \end{aligned}

with respective KS distances of d_{K S}\left(S, W_{3,70}\right)=0.0148, d_{K S}\left(S, W_{4,70}\right)=0.0050, d_{K S}\left(S, W_{5,70}\right)=0.0035 and d_{K S}\left(S, W_{6,70}\right)=0.0024. S. C. Lee and Lin (2010) used the EM algorithm to fit a mixed Erlang distribution to the same distribution, which resulted in the following model:

\begin{aligned} F_{W_{E M}}(x)= & 0.2282 H(x ; 2,1.9603) \\ & +0.5430 H(x ; 9,1.9603) \\ & +0.2288 H(x ; 14,1.9603), \end{aligned}

with KS distance d_{K S}\left(S, W_{E M}\right)=0.0094. Note that the KS distance for the 3-moment approximation is greater than the KS distance obtained using the EM estimation. We recall that the EM algorithm finds maximum likelihood estimates using the approximated distribution as an input, while our method is based on only partial information on the approximated distribution (e.g., its first m moments). We observe that the fit improves and the KS distance decreases when more moments are included in the approximation. For illustrative purposes, we also provide in Tables 3 and 4 some values of VaR and TVaR for W_{m, 70}(m=3,4,5,6), W_{E M} and S.

Table 3.Values of \operatorname{VaR}_{\mathrm{x}} for W_{m, 70}(m=3,4,5,6), W_{E M}, and S
\kappa \operatorname{VaR}_\kappa\left(W_{E M}\right) \operatorname{VaR}_\kappa\left(W_{3,70}\right) \operatorname{VaR}_\kappa\left(W_{4,70}\right) \operatorname{VaR}_\kappa\left(W_{5,70}\right) \operatorname{VaR}_\kappa\left(W_{6,70}\right) \operatorname{VaR}_\kappa(S)
0.9 7.7069 7.8183 7.6726 7.6943 7.6969 7.6859
0.95 8.7494 8.8676 8.7595 8.7172 8.7825 8.7666
0.99 10.7708 10.8451 11.0385 11.0462 10.9514 11.0023
0.995 11.5349 11.5835 11.9366 12.0611 11.8566 11.8925
0.999 13.1604 13.1473 13.8488 13.9701 13.9651 13.8551
Table 4.Values of \mathrm{TVaR}_{\mathrm{x}} for W_{m, 70}(m=3,4,5,6), W_{E M} and S
\kappa \operatorname{TVaR}_{\mathrm{k}}\left(W_{E M}\right) \operatorname{TVaR}_{\mathrm{k}}\left(W_{3,70}\right) \operatorname{TVaR}_{\mathrm{k}}\left(W_{4,70}\right) \operatorname{TVaR}_{\mathrm{k}}\left(W_{5,70}\right) \operatorname{TVaR}_{\mathrm{k}}\left(W_{6,70}\right) \operatorname{TVaR}_{\mathrm{k}}(S)
0.9 9.0866 9.1909 9.1596 9.1404 9.1634 9.1598
0.95 9.9929 10.0845 10.1583 10.1234 10.1378 10.1469
0.99 11.8253 11.8620 12.2766 12.3665 12.2440 12.2528
0.995 12.5377 12.5481 13.1132 13.2302 13.1371 13.1065
0.999 14.0774 14.0266 14.9062 14.8770 15.0978 15.0069

3.2.3. Gompertz distribution

The Gompertz distribution with df F_{S}(x)=1- \exp \left\{-B\left(c^{x}-1\right) / \ln c\right\}(x>0) has been extensively applied in various life contingency contexts (e.g., Bowers et al. 1997). Lenart (2014) provides an expression for the j-th moment, namely

\mu_{j}=\frac{j!}{(\ln c)^{j}}{\mathstrut}^{\frac{B}{e^{\ln }}} E_{1}^{j-1}\left(\frac{B}{\ln c}\right)\tag{7},

where E_{s}^{j}(z)=\int_{1}^{\infty} \frac{(\ln x)^{j}}{j!} x^{-s} e^{-z x} d x is the generalized integro-exponential function (see Milgram 1985). Here, we consider an example of Melnikov and Romaniuk (2006) on the 1959-1999 USA mortality data of the human mortality database where the parameters B and c were estimated to B=6.148 \times 10^{-5} and c=1.09159. Using (7), the first five moments are \mu_{5}=(76.3437, 6037.202, 489676.3, 40524308, 3410245408). We consider the class of mixed Erlang distributions \mathcal{M} \mathcal{E}^{\text {res }}\left(\mu_{m}, A_{90}\right). The resulting mixed Erlang approximations F_{W_{3,90}} and F_{W_{490}} are

\begin{aligned} F_{W_{3,90}}(x)= & 0.0154 H(x ; 22,1.0928) \\ & +0.2210 H(x ; 65,1.0928) \\ & +0.7637 H(x ; 90,1.0928) \end{aligned}

and

\begin{aligned} F_{W_{4}, 90} & (x)= \\ & 0.0347 H(x ; 35,1.0972) \\ & +0.0834 H(x ; 66,1.0972) \\ & +0.1009 H(x ; 67,1.0972) \\ & +0.7810 H(x ; 90,1.0972), \end{aligned}

with KS distances d_{K S}\left(S, W_{3,90}\right)=0.0188, and d_{K S}\left(S, W_{4,90}\right) =0.0239. In Figure 2, we compare the fit of the 3 - and 4 -moment approximations to the exact distribution by plotting their densities (left) and dfs (right). Overall, we observe that the fit is quite reasonable.

Figure 2
Figure 2.Density (left) and df (right): 3- and 4-moment approximations vs. Gompertz distribution.

Note that the KS distance increases from the 3 -moment to the 4-moment approximation. We notice that both F_{W_{3,9}} and F_{W_{4,90}} use the Erlang-90 df, where 90 is the largest element of A_{90}. As such, a mixed Erlang approximation with a smaller KS distance can likely be found in both cases by choosing a larger support A_{l} (i.e., l>90 ).

3.2.4. Some remarks

To provide insight on the quality of the momentbased mixed Erlang approximation proposed in this section, we briefly revisit the results of the above three examples. In the first two (lognormal and mixture of two gammas), the mixed Erlang approximation is easy to implement and provides a very satisfactory fit to the true distribution. As none of the resulting mixed Erlang approximations uses the Erlang-70 df (as l=70 in the first two examples), it is unlikely that a better approximation (from the viewpoint of KS distance) can be found by increasing the value of l. Given that the best KS-fit is found from a mixture of Erlang distributions with relatively small shape parameter (which corresponds to the parameter k in the Erlang df (2)), the proposed method seems particularly well suited for these two cases.

As for any approximation method, limitations can also be found, as evidenced by the Gompertz example. Indeed, for this example, both the 3-moment and 4-moment mixed Erlang approximations use the Erlang-90 df (recall l=90 for this example). As mentioned earlier, one can likely reduce the KS distance (if so desired) of the resulting mixed Erlang approximation by increasing the value of l (in light of the comments in Remark 4). This implies that the KS-optimal mixed Erlang approximation would likely involve Erlang distributions with large shape parameters (which have smaller variances for a given rate parameter \beta ).

In general, distributions with negative skewness and sharp density peak(s) may require the use of Erlang distributions with large shape parameters to provide a good approximation. Computational time of the proposed mixed Erlang methodology may become a non-negligible issue in these cases (especially as the number of moments matched increases). However, a slight adjustment to the proposed methodology may be considered to address this time-consuming issue. Indeed, one may replace the set A_{l}=\{1,2, \ldots, l\} in (1) by a set of the form \{a, 2 a, \ldots, j a\} for positive integers a and j (note that the two sets coincide when a=1 and j=l ). To illustrate this, we have reconsidered the Gompertz example by replacing the set A_{90} by the set \{5,10,15, \ldots, 200\}. The resulting mixed Erlang approximation, denoted by the rv W_{m, 5: 200} when m moments are matched ( m=3,4,5 ), are given by

\begin{aligned} F_{W_{3,5 ; 20}}(x)= & 0.0447 H(x ; 45,1.3185) \\ & +0.2573 H(x ; 85,1.3185) \\ & +0.6980 H(x ; 110,1.3185), \end{aligned}

\begin{aligned} F_{W_{4,5: 200}}(x)= & 0.0168 H(x ; 40,1.6422) \\ & +0.0971 H(x ; 85,1.6422) \\ & +0.3044 H(x ; 115,1.6422) \\ & +0.5818 H(x ; 140,1.6422), \end{aligned}

and

\begin{aligned} F_{W_{5,5: 200}}(x)= & 0.0038 H(x ; 20,1.9095) \\ & +0.0393 H(x ; 75,1.9095) \\ & +0.1262 H(x ; 110,1.9095) \\ & +0.3275 H(x ; 140,1.9095) \\ & +0.5032 H(x ; 165,1.9095), \end{aligned}

with KS distances d_{K S}\left(S, W_{3,5: 200}\right)=0.0171, d_{K S}\left(S, W_{4,5: 200}\right) =0.0086, and d_{K S}\left(S, W_{5,5: 200}\right)=0.0072. Note that the KS distance in the 4-moment approximation is considerably lower than for d_{K S}\left(S, W_{4,90}\right)=0.0239. In Figure 3, we compare the fit of the 5-moment approximation W_{5,5: 200} to the Gompertz distribution by plotting their densities (left) and dfs (right). We can see that the fit is quite acceptable and of a better quality than the two approximations displayed in Figure 3.

Figure 3
Figure 3.Density (left) and df (right): 5-moment approximation vs Gompertz distribution.

4. Moment-based mixed Erlang approximation with known \beta

4.1. Basic definitions

We consider here a slightly different context than the one of Section 3. Instead of approximating a general df F_{S} with known moments \mu_{m}, we assume that the df F_{S} is known to be of mixed Erlang form (1) with given rate parameter \beta>0 and first m moments \mu_{m}. However, the mixing weights \left\{\zeta_{k}\right\}_{k=1}^{l} are assumed unknown or difficult to obtain. Various applications in risk theory and credit risk fall into this context (e.g., Cossette, Gaillardetz, and Marceau 2002; Lindskog and McNeil 2003, and McNeil, Frey, and Embrechts 2005; see also the application of Section 4.4).

For a given rate parameter \beta>0, let \mathcal{M E}\left(\mu_{m}, \beta\right) be the set of all mixed Erlang distributions with df (1) (as l \rightarrow \infty) and first m moments \mu_{m}. Also, define \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right) to be the subset of \mathcal{M E}\left(\mu_{m}, \beta\right) with df (1) for a given l \in \mathbb{N}^{+}, and let \mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, A_{l}, \beta\right) be a further subset of \mathcal{M} \mathcal{E}\left(\mu_{m}, A_{l}, \beta\right) such that at most (m+1) of the mixing weights \left\{\zeta_{k}\right\}_{k=1}^{l} are non-zero. Note that a distribution in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right) can be expressed as a convex combination of distributions in \mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, A_{l}, \beta\right) (see, e.g., De Vylder 1996).

For a given function \phi, we consider two approaches to derive bounds and approximations for E[\phi(S)] when S \in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right) (in cases when the expectation exists). The first approach is based on discrete s-convex extremal distributions while the second is based on moment bounds on discrete expected stop-loss transforms.

Remark 5. Naturally, the set \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right) tends to \mathcal{M E}\left(\mu_{m}, \beta\right) as l \rightarrow \infty. As such, when S \in \mathcal{M E}\left(\mu_{m}, \beta\right), bounds for risk measures on \mathcal{M E}\left(\boldsymbol{\mu}_{m}, \beta\right) can be approximated by their counterparts in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right) for l reasonably large.

For a mixed Erlang rv S with df (1), its j-th moment is known to satisfy (6). Using the identity

\prod_{i=0}^{j-1}(k+i)=\sum_{n=1}^{j} s(j, n) k^{n},

where the s(j, n) 's are the (signed) Stirling numbers of the first kind (e.g., Abramowitz and Stegun 1972), (6) becomes

\beta^{j} \mu_{j}=\sum_{n=1}^{j} s(j, n) \kappa_{n},\tag{8}

for j=1, \ldots, m where \kappa_{n}=\sum_{k=1}^{l} \zeta_{k} k^{n}. In matrix form, we have \mathbf{M}_{\beta}=\mathbf{s} \mathbf{x}_{m}^{T}, where \mathbf{s}=\{s(j, n)\}_{j, n=1}^{m} (with s(j, n)=0 for n>j ) and \boldsymbol{\kappa}_{m}=\left(\kappa_{1}, \ldots, \kappa_{m}\right). Isolating \boldsymbol{\kappa}_{m} yields

\boldsymbol{\kappa}_{m}=\left(\mathbf{s}^{-1} \mathbf{M}_{\beta}\right)^{T} .\tag{9}

From Comtet (1974, 213) we know that \mathbf{s}^{-1} \equiv \mathbf{c}=\left\{(-1)^{i+j} c(i, j)\right\}_{i, j=1}^{m} where the c(i, j)^{\prime} s are the Stirling numbers of the second kind defined as c(i, j) =(j!)^{-1} \sum_{k=0}^{j}(-1)^{j-k}\binom{j}{k} k^{i} (e.g., Abramowitz and Stegun 1972).

Thus, for a given \beta>0, the class \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right) can be found through the identification of all discrete distributions with support A_{l} and first m moments given by the right-hand side of (9). Equivalently, the class \mathcal{M} \mathcal{E}^{e x}\left(\mu_{m}, A_{l}, \beta\right) can be identified by restricting the discrete distributions on A_{l} to have at most (m+1) non-zero mass points. This argument is formalized in the next section.

4.2. Discrete s-convex extremal distributions

Let \mathcal{D}\left(\boldsymbol{\alpha}_{m}, A_{l}\right) be all discrete distributions with support A_{l} and first m moments \alpha_{m}=\left(\boldsymbol{\alpha}_{1}, \alpha_{2}, \ldots, \alpha_{m}\right). Also, denote by \mathcal{D}^{e x t}\left(\boldsymbol{\alpha}_{m}, A_{l}\right) the subset of \mathcal{D}\left(\boldsymbol{\alpha}_{m}, A_{l}\right) with distributions having at most (m+1) non-zero mass points. For a given \beta>0, each distribution
in \mathcal{D}\left(\boldsymbol{\kappa}_{m}, A_{l}\right) (and \mathcal{D}^{\text {ext }}\left(\boldsymbol{\kappa}_{m}, A_{l}\right) ) with \boldsymbol{\kappa}_{m} as defined in (9) corresponds to a mixed Erlang distribution in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right) (and \mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, A_{l}, \beta\right) ). This is a one-to-one correspondence (e.g., De Vylder 1996, part 2).

Remark 6. Because of this one-to-one correspondence between \mathcal{D}\left(\kappa_{m}, A_{l}\right) and \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right), conditions under which \mathcal{M} \mathcal{E}\left(\mu_{m}, A_{l}, \beta\right) is not empty can be found from its discrete counterpart \mathcal{D}\left(\kappa_{m}, A_{l}\right). We refer the reader to, e.g., De Vylder (1996), Marceau (1996), or Courtois and Denuit (2009).

This allows us to make use of the theory developed in Prékopa (1990), Denuit and Lefèvre (1997), Denuit, Lefèvre, and Mesfioui (1999), and Courtois, Denuit, and Van Bellegem (2006) to derive bounds/approximations for E[\phi(S)] when S \in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right).

First, we briefly recall the definitions of s-convex function and s-convex order introduced by Denuit, Lefèvre, and Shaked (1998).

Definition 7. Denuit, Lefèvre, and Shaked (1998, 2000). Let \mathcal{C} be a subinterval of \mathbb{R} or a subset of \mathbb{N}, and \phi a function on \mathcal{C}. For a positive integer s and x_{0} <x_{1}<\cdots<x_{s} \in \mathcal{C}, we recursively define the divided differences as

\begin{aligned} {\left[x_{0}, x_{1}, \ldots, x_{k}\right] \phi } & =\frac{\left[x_{1}, x_{2}, \ldots, x_{k}\right] \phi-\left[x_{0}, x_{1}, \ldots, x_{k-1}\right] \phi}{x_{k}-x_{0}} \\ & =\sum_{i=1}^{k} \frac{\phi\left(x_{i}\right)}{\prod_{j=0, j j i}^{k}\left(x_{i}-x_{j}\right)}, \quad k=1,2, \ldots, s, \end{aligned}

where \left[x_{k}\right] \phi=\phi\left(x_{k}\right) for k=0,1, \ldots, s. The function \phi is s-convex if \left[x_{0}, x_{1}, \ldots, x_{s}\right] \phi \geq 0 for all x_{0}<x_{1} <\cdots<x_{s} \in \mathcal{C}.

We mention that the definition of s-convex function, which refers to higher-convexity, should not be confused with the one for Schur-convex function, also known as S-convex function.

Definition 8. Denuit, Lefèvre, and Shaked (2000). For two rv’s X and Y defined on \mathcal{C}, X is smaller than Y in the s-convex sense, namely X \square_{s-c x}^{\mathcal{C}} Y, if E[\phi(X)] \leq E[\phi(Y)] for all s-convex functions \phi (provided the expectation exists).

We mention that the 1 -convex order corresponds to the usual stochastic dominance order, and the 2-convex order is the usual convex order (see Müller and Stoyan (2002), Denuit et al. (2005), and Shaked and Shanthikumar (2007) for a review on stochastic orders). Also, as stated in Theorem 1.6.3 of Müller and Stoyan (2002), the s-convex order can only be used to compare rv’s with the same first (s-1) moments (which explains why s is chosen to be m+1 in what follows). Examples of s-convex functions are \phi(x)=x^{x+j} for j \in \mathbb{N} and \phi(x)=\exp (c x) for c \geq 0. For a general treatment of the s-convex order, see, e.g., Denuit, Lefèvre, and Mesfioui (1999) and Denuit, Lefèvre, and Shaked (2000) and section 1.6 of Müller and Stoyan (2002).

Let K_{(m+1)-\text { min }} and K_{(m+1)-\max } be the (m+1)-extremum rv’s on \mathcal{D}\left(\boldsymbol{\kappa}_{m}, A_{\nu}\right), i.e., those which satisfy

\begin{gathered} E\left[\phi\left(K_{(m+1)-\min }\right)\right] \leq E[\phi(K)] \\ \quad \leq E\left[\phi\left(K_{(m+1)-\max }\right)\right], \end{gathered}

for any (m+1)-convex function \phi and K \in \mathcal{D}\left(\boldsymbol{\kappa}_{m}, A_{l}\right). The general distribution forms of K_{(m+1)-\text { min }} and K_{(m+1) \text {-max }} are given in Prékopa (1990) (see also Courtois, Denuit, and Van Bellegem (2006, Section 4)) and are repeated here:

\scriptsize{ \begin{array}{|c|c|c|} \hline & m+1 \text { even } & m+1 \text { odd } \\ \hline \text { support of } K_{(m+1)-\min } & \left\{j_1, j_1+1, \ldots, j_{\frac{m+1}{2}}, j_{\frac{m+1}{2}}+1\right\} & \left\{1, j_1, j_1+1, \ldots, j_{\frac{m}{2}}, j_{\frac{m}{2}}+1\right\} \\ \hline \text { support of } K_{(m+1)-\max } & \left\{1, j_1, j_1+1, \ldots, j_{\frac{m-1}{2}}, j_{\frac{m-1}{2}}+1, l\right\} & \left\{j_1, j_1+1, \ldots, j_{\frac{m}{2}}, j_{\frac{m}{2}}+1, l\right\} \\ \hline \end{array} \tag{10}}

where 1<j_{1}<j_{1}+1<j_{2}<\cdots<l. From (10), it is clear that the support of K_{(m+1)-\min (\max )} has at most m+1 elements.

Let W_{K}=\sum_{j=1}^{K} C_{j} be a mixed Erlang rv, i.e., \left\{C_{j}\right\}_{j \geq 1} are a sequence of iid exponential rv’s with mean 1 / \beta, independent of K. The following result of Denuit, Lefèvre, and Utev (1999, Property 5.7) relates to the stability of the s-convex order under compounding.

Lemma 9 If K \square_{s-c x}^{\mathrm{A}_{l}} K^{\prime}, then W_{K} \square_{s-c x}^{\mathbb{R}+} W_{K^{\prime}}.

We apply Lemma 9 to define the mixed Erlang rv’s W_{K_{(m+1)-\min }} and W_{K_{(m+1)-\max }}, which are the (m+1) extremum rv’s on \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right). It is immediate that, for W \in \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right),

W_{K_{(m+1)-\min }} \preceq_{(m+1)-c x}^{\mathbb{R}+} W \preceq \preceq_{(m+1)-c x}^{\mathbb{R}+} W_{K_{(m+1)-\max }} . \tag{11}

For instance, using (11), the (m+1)-convex functions \phi(x)=x^{m+1+j}(j \in \mathbb{N}) and \phi(x)=\exp (c x)(c \geq 0) yield

E\left[W_{K_{(m+1)-\min }}^{m+1+j}\right] \leq E\left[W^{m+1+j}\right] \leq E\left[W_{K_{(m+1)-\max }}^{m+1+j}\right]

and

\begin{gathered} E\left[\exp \left(c W_{K_{(m+1)-\min }}\right)\right] \leq E[\exp (c W)] \\ \leq E\left[\exp \left(c W_{K_{(m+1)-\max }}\right)\right], \end{gathered}

respectively.

4.3. Moment bounds on discrete expected stop-loss transforms

Extrema on the (m+1)-convex order yield bounds for E[\phi(W)] when \phi is (m+1)-convex and W \in \mathcal{M E}\left(\mu_{m}, A_{l}\right., \beta ). However, this approach is not appropriate to derive bounds on TVaR and the stop-loss premium when the number of known moments is greater than 2. It is well known that two rv’s with the same mean and variance cannot be compared under the convex order.

Consequently, we use an approach inspired from Courtois and Denuit (2009) (see also Hürlimann 2002) to derive bounds on TVaR and the stop-loss premium. We consider \mathcal{D}\left(\boldsymbol{\kappa}_{m}, A_{l}\right) and determine lower and upper bounds for E\left[(K-k)_{+}\right]for all k \in A_{l}. From the lower
bound, we define the corresponding rv K_{m-\text { down }} on A_{l} via the df

F_{K_{m-d o w n}}(k)=\left\{\begin{array}{c}1-\binom{\inf _{K \in \mathcal{D}\left(\kappa_{m}, A_{l}\right)} E\left[(K-k)_{+}\right]}{-\inf _{K \in \mathcal{D}\left(\kappa_{m}, A_{l}\right)} E\left[(K-k-1)_{+}\right]}, \\ k=1,2, \ldots, l-1, \\ 1, \\ k=l\end{array}\right.\tag{12}

for k \in A_{l}. Similarly, K_{m-u p} is defined as in (12) by replacing ‘inf’ by ‘sup’ in the definition. Given that, for K \in \mathcal{D}\left(\kappa_{m}, A_{l}\right),

E\left[\left(K_{m-\text { down }}-k\right)_{+}\right] \leq E\left[(K-k)_{+}\right] \leq E\left[\left(K_{m-\text {-up }}-k\right)_{+}\right],

for all k \in A_{l}, it implies that K_{m-d o w n(u p)} is smaller (larger) than K under the increasing convex order, namely K_{m-d o w n} \square_{i c x} K \square_{i c x} K_{m-u p} (see, e.g., Courtois and Denuit 2009). Note that K_{m-\text { down }} and K_{m-u p} do not belong to \mathcal{D}\left(\kappa_{m}, A_{l}\right), but both have first moment \kappa_{1}. The increasing convex order is stable under compounding, and thus

W_{K_{m-\text { doown }}} \preceq_{i c x} W_{K} \preceq_{i c x} W_{K_{m-u p}} .\tag{13}

Then, from Denuit et al. (2005, Proposition 3.4.8), it follows that

\begin{gathered} \operatorname{TVaR}_{\kappa}\left(W_{K_{m-\text { down }}}\right) \leq \operatorname{TVaR}_{\kappa}\left(W_{K}\right) \\ \leq \operatorname{TVaR}_{\kappa}\left(W_{K_{m-\text {-up }}}\right), \end{gathered}\tag{14}

for \kappa \in(0,1). Clearly, the rv’s W_{K_{m-d o w n}} and W_{K_{m-u p}} will most likely not belong to \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right).

Remark 10. Note that, when neither of the two aforementioned approaches are applicable, we propose to derive approximate bounds for E[\phi(S)] with S \in \mathcal{M E}^{\text {ext }}\left(\mu_{m}, A_{l}, \beta\right) by calculating E[\phi(W)] for all W \in \mathcal{M E}^{\text {ext }}\left(\mu_{m}, A_{l}, \beta\right) and choosing

E[\phi(W)]_{\min (\max )}=\inf _{W \in \mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, \boldsymbol{A}_{l}, \beta\right)} E[\phi(W)] .

Obviously, E[\phi(S)] does not necessarily lie between E[\phi(W)]_{\min } and E[\phi(W)]_{\max }. However, the ‘interval’ estimate \left[E[\phi(W)]_{\text {min }}, E[\phi(W)]_{\text {max }}\right] may give an idea of the variability of all solutions on \mathcal{M E}\left(\mu_{m}, A_{l}, \beta\right).

4.4. Example: Portfolio of dependent risks

We consider a portfolio of n dependent risks as described in the common mixture model of Cossette, Gaillardetz, and Marceau (2002). Let S=X_{1}+\cdots+X_{n} be the aggregate claim amount with X_{i}=B_{i} I_{i}. Conditional on a common mixture rv \Theta with pmf a_{j}=\mathbb{P}(\Theta=j) for j=1,2, \ldots, \left\{I_{i}\right\}_{i=1}^{n} are assumed to form a sequence of independent Bernoulli rv’s with \mathbb{P}\left(I_{i}=1 \mid \Theta=j\right)=1-\left(r_{i}\right)^{j} for r_{i} \in (0,1). As for the B_{i} 's, they are assumed to form a sequence of iid exponential rv’s of mean 1 , independent of \left\{I_{i}\right\}_{i=1}^{20} and \Theta.

In this context, it is clear that S is a two-point mixture of a degenerate rv at 0 and a mixed Erlang rv of the form (1) with l=n and \beta=1, i.e., its Laplace transform is given by

\begin{aligned} E\left[e^{-t S}\right] & \equiv 1-p+p E\left[e^{-t Y}\right] \\ & =\sum_{j=1}^{\infty} a_{j}\left\{\prod_{i=1}^{n}\left(\left(r_{i}\right)^{j}+\left(1-\left(r_{i}\right)^{j}\right) \frac{1}{1+t}\right)\right\}, \quad t \geq 0, \end{aligned}

where p=1-\sum_{j=1}^{\infty} a_{j} \prod_{i=1}^{n}\left(r_{i}\right)^{j}. We perform the momentbased approximation on the rv Y=(S \mid S>0) rather than S.

For illustrative purposes, we assume n=20 and a logarithmic pmf for \Theta, namely a_{j}=(0.5)^{j} /(j \ln 2) for j \geq 1. Also, the constants r_{i} are set such that the (unconditional) mean of I_{i} is q_{i}=1-E\left[\left(r_{i}\right)^{\ominus}\right] with q_{1}=\cdots =q_{10}=0.1 and q_{11}=\cdots=q_{20}=0.02. Under the above assumptions, the first five moments of Y are found to be \mu_{5}=(1.7999,6.2270,31.4785,208.1258,1693.7077).

Using the approach on discrete (m+1)-convex extremal distributions, the dfs F_{W_{\kappa_{(m+1)-\text { min }}}} and F_{W_{\kappa_{(m+1)-\text { max }}}} for m=4,5 are:

\begin{aligned} F_{W_{K_{5-\min }}}(x)= & 0.5365 H(x ; 1,1) \\ & +0.2127 H(x ; 2,1) \\ & +0.2232 H(x ; 3,1) \\ & +0.0245 H(x ; 6,1) \\ & +0.0030 H(x ; 7,1), \\ F_{W_{K_{5-\max }}}(x)= & 0.4860 H(x ; 1,1) \\ & +0.3981 H(x ; 2,1) \\ & +0.0621 H(x ; 4,1) \\ & +0.0537 H(x ; 5,1) \\ & +0.0000 H(x ; 20,1), \\ F_{W_{K_{6-\min }}}(x)= & 0.5126 H(x ; 1,1) \\ & +0.3179 H(x ; 2,1) \\ & +0.0531 H(x ; 3,1) \\ & +0.1082 H(x ; 4,1) \\ & +0.0059 H(x ; 7,1) \\ & +0.0023 H(x ; 8,1), \end{aligned}

and

\begin{aligned} F_{W_{K_{6-\max }}}(x)= & 0.5322 H(x ; 1,1) \\ & +0.2264 H(x ; 2,1) \\ & +0.2110 H(x ; 3,1) \\ & +0.0004 H(x ; 5,1) \\ & +0.0299 H(x ; 6,1) \\ & +0.0000 H(x ; 20,1) . \end{aligned}

Let X_{K_{(n+1)} \text {-inimman }} be arv with df F_{X_{K_{(m+1) \text { minimax }}}} (x)=1-p +p F_{W_{K_{(m+1)-\text { nimpax }}}}(x) for x \geq 0. It follows from Section 4.2 that lower (upper) bounds for the higher-order moments E\left[S^{j}\right](j=4,5, \ldots) and the exponential premium principle defined as \varphi_{\eta}(S)=\left(\ln E\left[e^{n s}\right]\right) / \eta ( \eta>0 ) can be found from their counterparts for X_{K_{\text {ctr }}}. A few numerical values are provided in Tables 5 and 6, respectively.

As expected, the bounds get sharper as the number of moments involved increases.

Table 5.Higher-order moments of S, X_{\kappa_{(m+1)-\min }} and X_{\kappa_{(m+1)-\max }}(m=4,5)
j E\left[X_{k_{5-\min }^j}^j\right] E\left[X_{K_{6-\text { min }}}^j\right] E\left[S^i\right] E\left[X_{k_{6-\max }^j}^j\right] E\left[X_{k_{5-\max }^j}^j\right]
4 138.7579 138.7579 138.7579 138.7579 138.7579
5 1125.9592 1129.1880 1129.1880 1129.1880 1149.9348
6 10748.5738 10873.8020 10881.2732 10922.7337 11993.6176
Table 6.Exponential premiums of S, X_{K_{(m+1)-\min }} and X_{K_{(m+1)-\max }}(m=4,5)
\eta \varphi_\eta\left(X_{K 5-\min }\right) \varphi_\eta\left(X_{K 6-\min }\right) \varphi_\eta(S) \varphi_\eta\left(X_{K_{6-\max }}\right) \varphi_\eta\left(X_{K_{5-\max }}\right)
0.2 1.5545 1.5546 1.5546 1.5548 1.5564
0.1 1.3536 1.3536 1.3536 1.3536 1.3536
0.01 1.2137 1.2137 1.2137 1.2137 1.2137

As for the second approach based on moment bounds with discrete expected stop-loss transforms, Table 7 presents the values of TVaR for X_{K_{m-down(up)}} (m=4,5) with df F_{X_{K_{m-down(up)}}}(x)=1-p+p F_{W_{K_{m-down(up)}}}(x) for x \geq 0.

Table 7.Values of TVaR for S, X_{\kappa_{m-\text { down }}} and X_{K_{m-u p}}(m=4,5)
\kappa Exact 4 moments 5 moments
\operatorname{TVaR}_x(S) \mathrm{TVaR}_{\mathrm{k}}\left(X_{K_{4-\text { down }}}\right) \operatorname{TVaR}_\kappa\left(X_{\text {к }_{4-\text { up }}}\right) \mathrm{TVaR}_{\mathrm{k}}\left(X_{K_{5-\text { down }}}\right) \operatorname{TVaR}_\kappa\left(X_{\text {к }_{5-\text { up }}}\right)
0.9 5.0696 4.9222 5.2062 4.9800 5.1490
0.95 6.2214 5.9708 6.4548 6.0594 6.3642
0.99 8.8460 8.2899 9.3301 8.4655 9.1767
0.995 9.9589 9.2500 10.5629 9.4679 10.3775
0.999 12.5066 11.4122 13.4854 11.7323 13.1382

As expected, the inequality (14) is verified. Also, we observe that the interval estimate of \mathrm{TVaR}_{\kappa}(S) shrinks as the number of moments matched increases.

Finally, given that neither method is applicable for the VaR risk measure, we make use of the technique discussed in Remark 10. We find the numerical values provided in Table 8.

Table 8.Minimal/Maximal values of VaR within \mathcal{M} \mathcal{E}^{\operatorname{ext}}\left(\mu_m, A_{20}, 1\right)(m=4,5) vs \operatorname{VaR}_{\mathrm{k}}(\mathrm{S})
\kappa Exact \mathcal{M} \mathcal{E}^{\operatorname{ext}}\left(\pmb{\mu}_4, A_{20}, 1\right) \mathcal{M} \mathcal{E}^{\operatorname{ext}}\left(\pmb{\mu}_5, A_{20}, 1\right)
\operatorname{VaR}_\kappa(S) \operatorname{VaR}_{\mathrm{x}}(X)_{\min } \operatorname{VaR}_{\mathrm{x}}(X)_{\max } \operatorname{VaR}_{\mathrm{x}}(X)_{\min } \operatorname{VaR}_{\mathrm{x}}(X)_{\max }
0.9 3.3965 3.3896 3.4031 3.3897 3.4001
0.95 4.5704 4.5539 4.6030 4.5584 4.5730
0.99 7.2334 7.2182 7.2690 7.2251 7.2533
0.995 8.3604 8.3143 8.3946 8.3539 8.3925
0.999 10.9388 10.7191 10.9781 10.9226 10.9536

For this example, we observe that the exact values of \operatorname{VaR}_{\kappa}(S) are within the minimal and maximal values of their corresponding risk measures among all members of M E^{e x t}\left(\mu_{m}, A_{20}, 1\right) for m=4,5.

Also, the spread between the minimal and maximal values of VaR are reduced when we go from 4 to 5 moments. An identical exercise for the TVaR risk measure resulted in the same conclusions.

References

Abramowitz, M., and I. Stegun. 1972. Handbook of Mathematical Functions. 9th ed. Washington, DC: US Government Printing Office.
Google Scholar
Altiok, T. 1985. “On the Phase-Type Approximations of General Distributions.” IIE Transactions 17 (2): 110–16. https:/​/​doi.org/​10.1080/​07408178508975280.
Google Scholar
Bobbio, A., A. Horváth, and M. Telek. 2005. “Matching Three Moments with Minimal Acyclic Phase Type Distributions.” Stochastic Models 21:303–26. https:/​/​doi.org/​10.1081/​STM-200056210.
Google Scholar
Bowers, N. L., H. U. Gerber, G. C. Hickman, D. A. Jones, and C. J. Nesbit. 1997. Actuarial Mathematics. Schaumburg, Illinois: Society of Actuaries.
Google Scholar
Chaubey, Y. P., J. Garrido, and S. Trudeau. 1998. “On the Computation of Aggregate Claims Distributions: Some New Approximations.” Insurance: Mathematics and Economics 23 (3): 215–30. https:/​/​doi.org/​10.1016/​S0167-6687(98)00029-8.
Google Scholar
Cheung, E. C. K., and J. K. Woo. 2016. “On the Discounted Aggregate Claim Costs until Ruin in Dependent Sparre Andersen Risk Processes.” Scandinavian Actuarial Journal 2016:1:63–91. https:/​/​doi.org/​10.1080/​03461238.2014.900519.
Google Scholar
Comtet, L. 1974. Advanced Combinatorics. Boston: Dordrecht-Holland. https:/​/​doi.org/​10.1007/​978-94-010-2196-8.
Google Scholar
Cossette, H., P. Gaillardetz, and E. Marceau. 2002. “Common Mixtures in the Individual Risk Model.” Bulletin de l’Association suisse des actuaires, 131–57.
Google Scholar
Cossette, H., M. Mailhot, and E. Marceau. 2012. “T-Var Based Capital Allocation for Multivariate Compound Distributions.” Insurance: Mathematics and Economics 50 (2): 247–56. https:/​/​doi.org/​10.1016/​j.insmatheco.2011.11.006.
Google Scholar
Courtois, C., and M. Denuit. 2007. “Bounds on Convex Reliability Functions with Known First Moments.” European Journal of Operational Research 177:365–77. https:/​/​doi.org/​10.1016/​j.ejor.2005.08.026.
Google Scholar
———. 2009. “Moment Bounds on Discrete Expected Stop-Loss Transforms, with Applications.” Methodology and Computing in Applied Probability 11:307–38. https:/​/​doi.org/​10.1007/​s11009-007-9048-0.
Google Scholar
Courtois, C., M. Denuit, and S. Van Bellegem. 2006. “Discrete S-Convex Extremal Distributions: Theory and Applications.” Applied Mathematics Letters 19:1367–77. https:/​/​doi.org/​10.1016/​j.aml.2006.02.006.
Google Scholar
Daykin, C. D., T. Pentikäinen, and H. Pesonen. 1994. Practical Risk Theory for Actuaries. New York: Chapman-Hall. https:/​/​doi.org/​10.1201/​9781482289046.
Google Scholar
De Vylder, F. E. 1996. Advanced Risk Theory. A Self-Contained Introduction. Bruxelles: Editions de l’Université Libre de Bruxelles—Swiss Association of Actuaries.
Google Scholar
Denuit, M., J. Dhaene, M. J. Goovaerts, and R. Kaas. 2005. Actuarial Theory for Dependent Risks—Measures, Orders and Models. New York: Wiley. https:/​/​doi.org/​10.1002/​0470016450.
Google Scholar
Denuit, M., C. Lefèvre, and M. Mesfioui. 1999. “On S-Convex Stochastic Extrema for Arithmetic Risks.” Insurance: Mathematics and Economics 25:143–55. https:/​/​doi.org/​10.1016/​S0167-6687(99)00030-X.
Google Scholar
Denuit, M., C. Lefèvre, and M. Shaked. 1998. “The S-Convex Orders among Real Random Variables, with Applications.” Mathematical Inequalities and Applications 1:585–613. https:/​/​doi.org/​10.7153/​mia-01-56.
Google Scholar
———. 2000. “Stochastic Convexity of the Poisson Mixture Model.” Methodology and Computing in Applied Probability 2 (3): 231–54. https:/​/​doi.org/​10.1023/​A:1010054211652.
Google Scholar
Denuit, M., C. Lefèvre, and S. Utev. 1999. “Generalized Stochastic Convexity and Stochastic Orderings of Mixtures.” Probability in the Engineering and Informational Sciences 13:275–91. https:/​/​doi.org/​10.1017/​S0269964899133023.
Google Scholar
Denuit, M., and M. Lefèvre. 1997. “Some New Classes of Stochastic Order Relations Among Arithmetic Random Variables, with Applications in Actuarial Sciences.” Insurance: Mathematics and Economics 20:197–214. https:/​/​doi.org/​10.1016/​S0167-6687(97)00010-3.
Google Scholar
Dufresne, D. 2007. “Fitting Combinations of Exponential to Probability Distributions.” Applied Stochastic Models in Business and Industry 23:23–48. https:/​/​doi.org/​10.1002/​asmb.635.
Google Scholar
Gerber, H. U. 1979. An Introduction to Mathematical Risk Theory. S. S. Huebner Foundation. Philadelphia: University of Pennsylvania.
Google Scholar
Hürlimann, W. 2002. “Analytical Bounds for Two Value-at-Risk Functionals.” ASTIN Bulletin 32:235–65. https:/​/​doi.org/​10.2143/​AST.32.2.1028.
Google Scholar
Johnson, M. A., and M. R. Taaffe. 1989. “Matching Moments to Phase Distributions: Mixtures of Erlang Distribution of Common Order.” Stochastic Models 5:711–43. https:/​/​doi.org/​10.1080/​15326348908807131.
Google Scholar
Karlin, S., and W. J. Studden. 1966. Tchebycheff Systems: With Applications in Analysis and Statistics. New York: Wiley.
Google Scholar
Landriault, D., and G. E. Willmot. 2009. “On the Joint Distributions of the Time to Ruin, the Surplus Prior to Ruin and the Deficit at Ruin in the Classical Risk Model.” North American Actuarial Journal 13 (2): 252–70. https:/​/​doi.org/​10.1080/​10920277.2009.10597550.
Google Scholar
Lee, S. C., and X. S. Lin. 2010. “Modeling and Evaluating Insurance Losses via Mixtures of Erlang Distributions.” North American Actuarial Journal 14 (1): 107–30. https:/​/​doi.org/​10.1080/​10920277.2010.10597580.
Google Scholar
Lee, Y. S., and T. K. Lin. 1992. “Higher-Order Cornish Fisher Expansion.” Applied Statistics 41:233–40. https:/​/​doi.org/​10.2307/​2347649.
Google Scholar
Lenart, A. 2014. “The Moments of the Gompertz Distribution and Maximum Likelihood Estimation of Its Parameters.” Scandinavian Actuarial Journal, 255–77. https:/​/​doi.org/​10.1080/​03461238.2012.687697.
Google Scholar
Lindskog, F., and A. J. McNeil. 2003. “Common Poisson Shock Models: Applications to Insurance and Credit Risk Modelling.” ASTIN Bulletin 33 (2): 209–38. https:/​/​doi.org/​10.1017/​S0515036100013441.
Google Scholar
Marceau, E. 1996. “Classical Risk Theory and Schmitter’s Problems.” PhD Thesis, Louvain-la-Neuve: Université Catholique de Louvain.
McNeil, A. J., R. Frey, and P. Embrechts. 2005. Quantitative Risk Management. Princeton: Princeton University Press.
Google Scholar
Melnikov, A., and Y. Romaniuk. 2006. “Evaluating the Performance of Gompertz, Makeham and Lee–Carter Mortality Models for Risk Management with Unit-Linked Contracts.” Insurance: Mathematics and Economics 39:310–29. https:/​/​doi.org/​10.1016/​j.insmatheco.2006.02.012.
Google Scholar
Milgram, M. 1985. “The Generalized Integro-Exponential Function.” Mathematics of Computation 44 (170): 443–58. https:/​/​doi.org/​10.1090/​S0025-5718-1985-0777276-4.
Google Scholar
Müller, A., and D. Stoyan. 2002. Comparison Methods for Stochastic Models and Risks. New York: Wiley.
Google Scholar
Osogami, T., and M. Harchol-Balter. 2006. “Closed Form Solutions for Mapping General Distributions to Quasi-Minimal PH Distributions.” Performance Evaluation 63:524–52. https:/​/​doi.org/​10.1016/​j.peva.2005.06.002.
Google Scholar
Prékopa, A. 1990. “The Discrete Moment Problem and Linear Programming.” Discrete Applied Mathematics 27:235–54. https:/​/​doi.org/​10.1016/​0166-218X(90)90068-N.
Google Scholar
Ramsay, C. 1991. “A Note on the Normal Power Approximation.” ASTIN Bulletin 21:147–50. https:/​/​doi.org/​10.2143/​AST.21.1.2005407.
Google Scholar
Seal, H. L. 1977. “Approximation to Risk Theory’s F(x,t) by Means of the Gamma Distribution.” ASTIN Bulletin 9:213–18. https:/​/​doi.org/​10.1017/​S0515036100011521.
Google Scholar
Shaked, M., and J. G. Shanthikumar. 2007. Stochastic Orders and Their Applications. 2nd ed. New York: Springer-Verlag. https:/​/​doi.org/​10.1007/​978-0-387-34675-5.
Google Scholar
Telek, M., and A. Heindl. 2002. “Matching Moments for Acyclic Discrete and Continuous Phase-Type Distributions of Second Order.” International Journal of Simulation Systems, Science and Technology 3 (3–4): 47–57.
Google Scholar
Tijms, H. C. 1994. Stochastic Models: An Algorithmic Approach. Chichester: John Wiley.
Google Scholar
Vanden Bosch, P. M., D. C. Dietz, and E. A. Pohl. 2000. “Moment Matching Using a Family of Phase-Type Distributions.” Communications in Statistics—Stochastic Models 16 (3–4): 391–98. https:/​/​doi.org/​10.1080/​15326340008807595.
Google Scholar
Venter, G. 1983. “Transformed Beta and Gamma Distributions and Aggregate Losses.” Proceedings of the Casualty Actuarial Society, 156–93.
Google Scholar
Whitt, W. 1982. “Approximating a Point Process by a Renewal Process, I: Two Basic Methods.” Operation Research 30 (1): 125–47. https:/​/​doi.org/​10.1287/​opre.30.1.125.
Google Scholar
Willmot, G. E., and X. S. Lin. 2011. “Risk Modelling with the Mixed Erlang Distribution.” Applied Stochastic Models in Business and Industry 27:2–16. https:/​/​doi.org/​10.1002/​asmb.838.
Google Scholar
Willmot, G. E., and J. K. Woo. 2007. “On the Class of Erlang Mixtures with Risk Theoretic Applications.” North American Actuarial Journal 11 (2): 99–115. https:/​/​doi.org/​10.1080/​10920277.2007.10597450.
Google Scholar

This website uses cookies

We use cookies to enhance your experience and support COUNTER Metrics for transparent reporting of readership statistics. Cookie data is not sold to third parties or used for marketing purposes.

Powered by Scholastica, the modern academic journal management system