1. Introduction
Mixed Erlang distributions are known to yield analytic solutions to many risk management problems of interest. This is primarily due to the tractable features of this distributional class. Among others, the class of mixed Erlang distributions is closed under various operations such as convolutions and Esscher transformations (e.g., Willmot and Woo 2007 and Willmot and Lin 2011). As such, risk aggregation and ruin problems can more easily be tackled under mixed Erlang assumptions (e.g., Cheung and Woo 2016; Cossette, Mailhot, and Marceau 2012, and Landriault and Willmot 2009). Also, Tijms (1994) showed that the class of mixed Erlang distributions is dense in the set of all continuous and positive distributions. Therefore, we consider a moment-based approximation method which capitalizes on the aforementioned properties of the mixed Erlang distribution. More precisely, we propose to approximate a distribution with known moments by a moment-matching mixed Erlang distribution. Moment-based approximations have been extensively developed in various research areas, including performance evaluation, queueing theory, and risk theory, to name a few.
Osogami and Harchol-Balter (2006) identify the following four criteria to evaluate moment-matching algorithms: (1) the number of moments matched; (2) the computational efficiency of the algorithm; (3) the generality of the solution; and (4) the minimality of the number of parameters (phases). It also seems desirable for the approximation to be in itself a distribution. This is not mentioned in Osogami and Harchol-Balter (2006) for the obvious reason that they consider phase-type distributions as their moment-based approximation class. There exists an extensive literature on the approximation of distributions by a specific subset of phase-type distributions using moment-based techniques. For instance, Whitt (1982) proposed a mixture of two exponential distributions or a generalized Erlang distribution as a moment-based approximation when either the coefficient of variation (CV) is greater than or less than 1 , respectively. Also, both Altiok (1985) and Vanden Bosch, Dietz, and Pohl (2000) proposed an alternative to the
moment-based approximation of Whitt (1982) when using a Coxian distribution. Alternatively, Johnson and Taaffe (1989) considered a mixture of Erlangs with a common shape (order) parameter as their moment-based approximation.
Most predominantly, there exists a substantial body of literature on the three-moment approximation within the phase-type class of distributions (e.g., Telek and Heindl 2002; Bobbio, Horváth, and Telek 2005, and references therein). Matching the first three moments is often viewed as effective to provide a reasonable approximation to the underlying system (e.g., Osogami and Harchol-Balter 2006 and references therein). However, as illustrated in this paper and many others, three moments does not always suffice, triggering the development of more flexible moment-based approximations. Among others, we mention the work of Johnson and Taaffe (1989) on mixed Erlang distributions of common order. Also, Dufresne (2007) proposes two approximation techniques based on Jacobi polynomial expansions and the logbeta distribution to fit combinations of exponential distributions. This paper is complementary to the aforementioned ones by considering the family of finite mixture of Erlangs with common rate parameter to approximate a distribution on as theoretically justified in the continuous case by Tijms (1994, Theorem 3.9.1). The reader is also referred to S. C. Lee and Lin (2010) where fitting of the same class of distributions is considered using the EM algorithm (which relies on the knowledge of the approximated distribution rather than only its moments).
It is worth pointing out that other non-phase type approximation methods have been widely used in actuarial science. A good survey paper on this topic is Chaubey, Garrido, and Trudeau (1998). One of these approximation classes are refinements to the normal approximation such as the normal power and the Cornish Fisher approximations (e.g., Ramsay 1991; Daykin, Pentikäinen, and Pesonen 1994, and Y. S. Lee and Lin 1992). These approximations are based on the first few moments. However, the resulting approximation is often not a proper distribution. Other moment-based distributional approximations are the translated gamma distribution (e.g., Seal 1977), translated inverse Gaussian distribution (e.g., Chaubey, Garrido, and Trudeau 1998) and the generalized Pareto distribution (e.g., Venter 1983). It should be noted that all these approximation methods are designed to fit a specific number of moments and thus lack the flexibility to match an arbitrary number of moments.
The rest of the paper is constructed as follows. In Section 2, a brief review on admissible moments, mixed Erlang distributions and the approximation method of Johnson and Taaffe (1989) is provided. Section 3 is devoted to our class of finite mixture of Erlangs with common rate parameter. Theoretical and practical considerations related to the approximation method are drawn. Various examples are considered to examine the quality of the resulting approximation. In Section 4, we consider applications of our momentbased approximations of Section 3 when the underlying distribution is of mixed Erlang form with known rate parameter. A parallel is drawn with a discrete moment-matching problem and certain stochastic orderings, notably the -convex stochastic order (e.g., Denuit, Lefèvre, and Shaked 1998). An application of Cossette, Gaillardetz, and Marceau (2002) will be examined in more detail.
2. Background
2.1. Admissible moments
Karlin and Studden (1966) provide the necessary and sufficient conditions for a set of (raw) moments to be from a probability distribution defined on To state this result, define the matrices and as
Pk=(1μ1⋯μkμ1μ2⋯μk+1⋮⋮⋱⋮μkμk+1⋯μ2k);Qk=(μ1μ2⋯μk+1μ2μ3⋯μk+2⋮⋮⋱⋮μk+1μk+2⋯μ2k+1).
As stated in Courtois and Denuit (2007), there exists a non-negative random variable (rv) with distribution function (df) and first moments if and only if the following two conditions are satisfied:
- for
- for
where
holds for the integer part of In what follows, we silently assume the moment set is from a probability distribution on2.2. Mixed Erlang distribution
We now review some known properties of mixed Erlang distributions with common rate parameter. A more elaborate review of this class of distributions can be found in Willmot and Woo (2007), S. C. Lee and Lin (2010), and Willmot and Lin (2011).
Let
be a mixed Erlang rv with common rate parameter and dfFW(x)=∑k∈AlζkH(x;k,β),
where
and is the probability mass function (pmf) of a discrete rv with support for a given The Erlang df is defined asH(x;k,β)≡1−ˉH(x;k,β)=1−e−βxk−1∑i=0(βx)ii!,x≥0,
where the parameters
and of the Erlang df are known as the shape and rate parameters, respectively. An alternative and useful representation of the mixed Erlang rv is where are iid exponential rv’s with mean independent of i.e., the rv follows a compound distribution.Remark 1. As in, e.g., Willmot and Woo (2007), we consider the class of mixed Erlang dfs (1) rather than the more general class of combinations of Erlangs where some 's are possibly negative. For the latter class, additional constraints on exist to ensure that the right-hand side of (1) is a non-decreasing function in This presents additional challenges in the subsequent moment-matching application, challenges which do not arise in the mixed Erlang case.
It is well known that the McNeil, Frey, and Embrechts 2005 and references therein) are the VaR and TVaR risk measures. For the mixed Erlang rv there is in general no closed form expression for where but its value can be obtained using a routine numerical procedure. As for its TVaR, S. C. Lee and Lin (2010) showed that
-th moment of is given by Of particular importance in actuarial science and quantitative risk management (see, e.g.,TVaRκ(W)≡11−κ∫1κVaRu(W)du=11−κ∞∑k=1ζkkβˉH(VaRκ(W);k+1,β).
Another quantity of interest is the stop-loss premium defined as
with For the mixed Erlang df (1), we haveπW(b)=11−κ∞∑k=1ζk(kβˉH(b;k+1,β)−bˉH(b;k,β)),b≥0
(see also Willmot and Woo (2007, Eq. 3.6) for the higher-order stop-loss moments). Tijms (1994) showed that this class of distributions can approximate any continuous positive distribution with an arbitrary level of accuracy. For completeness, the theoretical foundation of this result is given next.
Theorem 2. (Tijms 1994, Theorem 3.9.1). Let F be the df of a positive rv. For any given define
Fh(x)=∞∑k=1(F(kh)−F((k−1)h))H(x;k,1h).
Then,
Note that in (4) is a mixed Erlang df of the form (1) with and rate parameter
Several approximation methods motivated by Tijms’ theorem were proposed over the years (see Section 1 for more details). In general, these moment-based approximations propose to work with a specific subclass of all finite and infinite mixed Erlang distributions. Among them, we recall the method of Johnson and Taaffe (1989), who will be used later for comparative purposes.
2.3. Method of Johnson and Taaffe (1989)
Johnson and Taaffe (1989) investigated the use of mixtures of Erlang distributions with common shape parameter for moment-matching purposes. More precisely, mixtures of (or fewer) Erlangs with common shape parameters are used to match the first moments (whenever the set of moments is within the feasible set). For the three-moment matching problem, Johnson and Taaffe (1989) generalized the approximation of Whitt (1982) and Altiok (1985) by enlarging the set of feasible moments when Their method is also valid for some combinations of when
Their three-moment approximation is a mixture of two Erlangs with common shape parameter Johnson and Taaffe 1989), i.e.,
(see Theorem 3 ofF(x)=pH(x;r,β1)+(1−p)H(x;r,β2),
where
and are the solutions of withA=r(r+2)μ1(μ2−r+1rμ21)B=−(r(μ1μ3−r+1r+2μ22)+r(r+2)r+1(μ2−r+1rμ21)2+(n+2)μ21(μ2−r+1rμ21))C=μ1(μ1μ3−r+1r+2μ22).
The choice of the shape parameter Johnson and Taaffe (1989, Proposition 4).
is discussed in3. Moment-based approximation with mixed Erlang distribution
In this section, we propose to use a different subclass of mixed Erlang distributions to examine momentbased approximation techniques.
3.1. Description of the approach
For a given
let be the set of all finite mixture of Erlangs with df (1) and first moments From Section 2.2, this consists in the identification of all solutions to the probleml∑k=1ζk∏j−1i=0(k+i)βj=μj,j=1,…,m,
under the constraints that
and is a probability measure onRemark 3. For a
with (1) and first moments we indifferently write or This will also apply to the other distributional classes.Also, let
be the (restricted) subset of with at most non-zero mixing probabilities Given that has a finite number of solutions, we propose to use as our approximation class. It is clear that forNote that for a continuous positive distribution with moments
we know from Theorem 2 that there exists a large enough such that is not empty. Even though no formal conclusion can be reached for the restricted class all our numerical studies have shown that this set has a large number of distributions (see, for instance, the examples of subsections 3.2.1 and 3.2.2) for a given when is chosen large enough.Distributions in the
class are identified as follows: for a given set with (6) can be rewritten in matrix form asGmζm=Mβ,
where
andGm=(i1i2⋯imi1(i1+1)i2(i2+1)⋯im(im+1)⋮⋮⋱⋮∏m−1i=0(i1+i)∏m−1i=0(i2+i)⋯∏m−1i=0(im+i)).
It follows that
under the constraint that where is the vector 1’s. Note that is a polynomial of degree (at most) in Thus, we only consider the real and positive solutions (in ) of and complete their mixed Erlang representation with the identification of the mixing weights The procedure is systematically repeated for all possible sets of distinct elements inRemark 4. Given that the above procedure is repeated
times, the computational efficiency of the proposed methodology is mostly driven by this number, and hence the parameters and should be chosen accordingly. For a given number of moments, we observe that: (a) larger values of l result in a more time-consuming numerical procedure; (b) however, should be chosen large enough for the approximation class to have a reasonable number of members (to legitimally produce a “good” approximation). From our numerical studies, we observe that the selection of (for a given ) can be problemspecific, and thus this tradeoff in the choice of l should be handled with care. However, as a rule of thumb, when is relatively small (i.e., ) which is traditionally in moment-matching exercises, a value of between 50 and 100 leads to reasonable mixed Erlang approximations. We refer the reader to the numerical illustrations and subsequent remarks in Section 3.2 for a more detailed discussion on this topic.Among all
as The KS distance is commonly used in the context of continuous distributions (e.g., Denuit et al. 2005). Therefore, the chosen mixed Erlang approximation within is the one minimizing the KS distance with the true df We denote by this approximation, i.e.,
d_{K S}\left(S, W_{m, l}\right)=\inf _{F_{W} \in \mathcal{M} \mathcal{E}^{\prime \prime s}\left(\mu_{m}, A_{l}\right)} \sup _{x \geq 0}\left|F_{S}(x)-F_{W}(x)\right|,
where
is a rv with df This requires the calculation of the KS distance for each mixed Erlang distribution in to identify its minimizer In general, an explicit expression for this KS distance does not exist, and hence we propose to numerically find this value by evaluating the distance between the two dfs over all multiples (up to a given high value) of a small discretization span.Note that other distances such as the stop-loss distance (e.g., Gerber 1979) could have been used to select our approximation distribution. Alternatively, one could have relied on another criterion to identify this approximation distribution (for instance, select the distribution in with the closest subsequent moment to the true distribution).
3.2. Numerical examples
We consider a few simple examples to illustrate the quality of the approximation. For comparative purposes, other approximation methods will also be discussed. Some concluding remarks on the mixed Erlang approximation method are later made based on the numerical experiment conducted next.
3.2.1. Lognormal distribution: Dufresne (2007, Example 5.4)
Let
where is a normal rv with mean 0 and variance 0.25 . The first 5 moments of are We consider the class of mixed Erlang distributions which have a total of 13198,89294 and 290422 distributions, respectively. The resulting mixed Erlang approximations are\begin{aligned} F_{W_{3,70}}(x)= & 0.8209 H(x ; 6,6.3219) \\ & +0.1727 H(x ; 12,6.3219) \\ & +0.0064 H(x ; 26,6.3219), \\ F_{W_{4}, 70}(x)= & 0.6350 H(x ; 7,8.3334) \\ & +0.2950 H(x ; 12,8.3334) \\ & +0.0672 H(x ; 20,8.3334) \\ & +0.0029 H(x ; 40,8.3334), \end{aligned}
and
\begin{aligned} F_{W_{5,70}}(x)= & 0.6273 H(x ; 7,8.3608) \\ & +0.3063 H(x ; 12,8.3608) \\ & +0.0609 H(x ; 20,8.3608) \\ & +0.0055 H(x ; 34,8.3608) \\ & +0.0001 H(x ; 69,8.3608), \end{aligned}
for Johnson and Taaffe (1989) is given by
with respective KS distances of and 0.0011 . Note that the quality of the mixed Erlang approximation (as measured by the KS distance) increases with the number of moments matched. For comparative purposes, the three-moment approximation (5) of\begin{aligned} F_{W_{J T}}(x)= & 0.0087 H(x ; 4,1.2804) \\ & +0.9913 H(x ; 4,3.5855), \end{aligned}
In Figure 1, we compare the density function of and
All three mixed Erlang approximations provide an overall good fit to the exact distribution. To further examine the tail fit, specific values of VaR and TVaR for the exact and approximated distributions are provided in Tables 1 and 2, respectively.
We observe that the VaR and TVaR values of the mixed Erlang approximations compare very well to their lognormal counterparts, especially for the 5-moment approximation. This is particularly true given that the lognormal distribution is known to have a heavier tail than the mixed Erlang distribution.
Note that the improvement is indeed not monotone with the number of moments matched, as increasing this number does not necessarily lead to a higher quality approximation in moment-matching techniques.
3.2.2. Mixture of two gamma distributions: S. C. Lee and Lin (2010, Section 5, Example 1)
Let
be a mixture of two gamma distributions with density\begin{aligned} f_{s}(s)= & 0.2 \frac{(3.2 s)^{2.6} e^{-3.2 s}}{s \Gamma(2.6)} \\ & +0.8 \frac{(1.2 s)^{6.3} e^{-1.2 s}}{s \Gamma(6.3)}, \quad s \geq 0, \end{aligned}
and first 6 moments
1369.8272, 11754.2149, 110674.4154). We consider the class of mixed Erlang distributionsfor
which are composed of 16000,83797 , 494532 and 1928919 distributions, respectively. The resulting mixed Erlang approximations are\begin{aligned} F_{W_{3,70}}(x)= & 0.2140 H(x ; 2,2.0835) \\ & +0.5215 H(x ; 9,2.0835) \\ & +0.2645 H(x ; 15,2.0835), \\ F_{W_{4,70}}(x)= & 0.2266 H(x ; 2,2.0469) \\ & +0.4440 H(x ; 9,2.0469) \\ & +0.2963 H(x ; 13,2.0469) \\ & +0.0330 H(x ; 19,2.0469), \\ F_{W_{5,70}}(x)= & 0.2023 H(x ; 3,3.7271) \\ & +0.2091 H(x ; 12,3.7271) \\ & +0.3936 H(x ; 19,3.7271) \\ & +0.1805 H(x ; 28,3.7271) \\ & +0.0145 H(x ; 42,3.7271), \end{aligned}
and
\begin{aligned} F_{W_{6,70}}(x)&= 0.0768 H(x ; 2,3.0731) \\ &\quad +0.1325 H(x ; 3,3.0731) \\ &\quad +0.2739 H(x ; 11,3.0731) \\ &\quad +0.3928 H(x ; 17,3.0731) \\ &\quad +0.1188 H(x ; 25,3.0731) \\ &\quad +0.0052 H(x ; 37,3.0731), \end{aligned}
with respective KS distances of S. C. Lee and Lin (2010) used the EM algorithm to fit a mixed Erlang distribution to the same distribution, which resulted in the following model:
and\begin{aligned} F_{W_{E M}}(x)= & 0.2282 H(x ; 2,1.9603) \\ & +0.5430 H(x ; 9,1.9603) \\ & +0.2288 H(x ; 14,1.9603), \end{aligned}
with KS distance 3 and 4 some values of VaR and TVaR for and
Note that the KS distance for the 3-moment approximation is greater than the KS distance obtained using the EM estimation. We recall that the EM algorithm finds maximum likelihood estimates using the approximated distribution as an input, while our method is based on only partial information on the approximated distribution (e.g., its first moments). We observe that the fit improves and the KS distance decreases when more moments are included in the approximation. For illustrative purposes, we also provide in Tables3.2.3. Gompertz distribution
The Gompertz distribution with df Bowers et al. 1997). Lenart (2014) provides an expression for the -th moment, namely
has been extensively applied in various life contingency contexts (e.g.,\mu_{j}=\frac{j!}{(\ln c)^{j}}{\mathstrut}^{\frac{B}{e^{\ln }}} E_{1}^{j-1}\left(\frac{B}{\ln c}\right)\tag{7},
where Milgram 1985). Here, we consider an example of Melnikov and Romaniuk (2006) on the 1959-1999 USA mortality data of the human mortality database where the parameters and were estimated to and Using (7), the first five moments are 6037.202, 489676.3, 40524308, 3410245408). We consider the class of mixed Erlang distributions The resulting mixed Erlang approximations and are
is the generalized integro-exponential function (see\begin{aligned} F_{W_{3,90}}(x)= & 0.0154 H(x ; 22,1.0928) \\ & +0.2210 H(x ; 65,1.0928) \\ & +0.7637 H(x ; 90,1.0928) \end{aligned}
and
\begin{aligned} F_{W_{4}, 90} & (x)= \\ & 0.0347 H(x ; 35,1.0972) \\ & +0.0834 H(x ; 66,1.0972) \\ & +0.1009 H(x ; 67,1.0972) \\ & +0.7810 H(x ; 90,1.0972), \end{aligned}
with KS distances Figure 2, we compare the fit of the 3 - and 4 -moment approximations to the exact distribution by plotting their densities (left) and dfs (right). Overall, we observe that the fit is quite reasonable.
and InNote that the KS distance increases from the 3 -moment to the 4-moment approximation. We notice that both
and use the Erlang-90 df, where 90 is the largest element of As such, a mixed Erlang approximation with a smaller KS distance can likely be found in both cases by choosing a larger support (i.e., ).3.2.4. Some remarks
To provide insight on the quality of the momentbased mixed Erlang approximation proposed in this section, we briefly revisit the results of the above three examples. In the first two (lognormal and mixture of two gammas), the mixed Erlang approximation is easy to implement and provides a very satisfactory fit to the true distribution. As none of the resulting mixed Erlang approximations uses the Erlang-70 df (as
in the first two examples), it is unlikely that a better approximation (from the viewpoint of KS distance) can be found by increasing the value of Given that the best KS-fit is found from a mixture of Erlang distributions with relatively small shape parameter (which corresponds to the parameter in the Erlang df (2)), the proposed method seems particularly well suited for these two cases.As for any approximation method, limitations can also be found, as evidenced by the Gompertz example. Indeed, for this example, both the 3-moment and 4-moment mixed Erlang approximations use the Erlang-90 df (recall
for this example). As mentioned earlier, one can likely reduce the KS distance (if so desired) of the resulting mixed Erlang approximation by increasing the value of (in light of the comments in Remark 4). This implies that the KS-optimal mixed Erlang approximation would likely involve Erlang distributions with large shape parameters (which have smaller variances for a given rate parameter ).In general, distributions with negative skewness and sharp density peak(s) may require the use of Erlang distributions with large shape parameters to provide a good approximation. Computational time of the proposed mixed Erlang methodology may become a non-negligible issue in these cases (especially as the number of moments matched increases). However, a slight adjustment to the proposed methodology may be considered to address this time-consuming issue. Indeed, one may replace the set
in (1) by a set of the form for positive integers and (note that the two sets coincide when and ). To illustrate this, we have reconsidered the Gompertz example by replacing the set by the set The resulting mixed Erlang approximation, denoted by the rv when moments are matched ( ), are given by\begin{aligned} F_{W_{3,5 ; 20}}(x)= & 0.0447 H(x ; 45,1.3185) \\ & +0.2573 H(x ; 85,1.3185) \\ & +0.6980 H(x ; 110,1.3185), \end{aligned}
\begin{aligned} F_{W_{4,5: 200}}(x)= & 0.0168 H(x ; 40,1.6422) \\ & +0.0971 H(x ; 85,1.6422) \\ & +0.3044 H(x ; 115,1.6422) \\ & +0.5818 H(x ; 140,1.6422), \end{aligned}
and
\begin{aligned} F_{W_{5,5: 200}}(x)= & 0.0038 H(x ; 20,1.9095) \\ & +0.0393 H(x ; 75,1.9095) \\ & +0.1262 H(x ; 110,1.9095) \\ & +0.3275 H(x ; 140,1.9095) \\ & +0.5032 H(x ; 165,1.9095), \end{aligned}
with KS distances Figure 3, we compare the fit of the 5-moment approximation to the Gompertz distribution by plotting their densities (left) and dfs (right). We can see that the fit is quite acceptable and of a better quality than the two approximations displayed in Figure 3.
and Note that the KS distance in the 4-moment approximation is considerably lower than for In4. Moment-based mixed Erlang approximation with known
4.1. Basic definitions
We consider here a slightly different context than the one of Section 3. Instead of approximating a general df Cossette, Gaillardetz, and Marceau 2002; Lindskog and McNeil 2003, and McNeil, Frey, and Embrechts 2005; see also the application of Section 4.4).
with known moments we assume that the df is known to be of mixed Erlang form (1) with given rate parameter and first moments However, the mixing weights are assumed unknown or difficult to obtain. Various applications in risk theory and credit risk fall into this context (e.g.,For a given rate parameter De Vylder 1996).
let be the set of all mixed Erlang distributions with df (1) (as and first moments Also, define to be the subset of with df (1) for a given and let be a further subset of such that at most of the mixing weights are non-zero. Note that a distribution in can be expressed as a convex combination of distributions in (see, e.g.,For a given function
we consider two approaches to derive bounds and approximations for when (in cases when the expectation exists). The first approach is based on discrete -convex extremal distributions while the second is based on moment bounds on discrete expected stop-loss transforms.Remark 5. Naturally, the set
tends to as As such, when bounds for risk measures on can be approximated by their counterparts in for reasonably large.For a mixed Erlang rv
with df (1), its -th moment is known to satisfy (6). Using the identity\prod_{i=0}^{j-1}(k+i)=\sum_{n=1}^{j} s(j, n) k^{n},
where the Abramowitz and Stegun 1972), (6) becomes
's are the (signed) Stirling numbers of the first kind (e.g.,\beta^{j} \mu_{j}=\sum_{n=1}^{j} s(j, n) \kappa_{n},\tag{8}
for
where In matrix form, we have where (with for ) and Isolating yields\boldsymbol{\kappa}_{m}=\left(\mathbf{s}^{-1} \mathbf{M}_{\beta}\right)^{T} .\tag{9}
From Comtet (1974, 213) we know that where the s are the Stirling numbers of the second kind defined as (e.g., Abramowitz and Stegun 1972).
Thus, for a given
the class can be found through the identification of all discrete distributions with support and first moments given by the right-hand side of (9). Equivalently, the class can be identified by restricting the discrete distributions on to have at most non-zero mass points. This argument is formalized in the next section.4.2. Discrete s-convex extremal distributions
Let
in (and ) with as defined in (9) corresponds to a mixed Erlang distribution in (and ). This is a one-to-one correspondence (e.g., De Vylder 1996, part 2).
Remark 6. Because of this one-to-one correspondence between De Vylder (1996), Marceau (1996), or Courtois and Denuit (2009).
and conditions under which is not empty can be found from its discrete counterpart We refer the reader to, e.g.,This allows us to make use of the theory developed in Prékopa (1990), Denuit and Lefèvre (1997), Denuit, Lefèvre, and Mesfioui (1999), and Courtois, Denuit, and Van Bellegem (2006) to derive bounds/approximations for when
First, we briefly recall the definitions of Denuit, Lefèvre, and Shaked (1998).
-convex function and -convex order introduced byDefinition 7. Denuit, Lefèvre, and Shaked (1998, 2000). Let be a subinterval of or a subset of and a function on For a positive integer and we recursively define the divided differences as
\begin{aligned} {\left[x_{0}, x_{1}, \ldots, x_{k}\right] \phi } & =\frac{\left[x_{1}, x_{2}, \ldots, x_{k}\right] \phi-\left[x_{0}, x_{1}, \ldots, x_{k-1}\right] \phi}{x_{k}-x_{0}} \\ & =\sum_{i=1}^{k} \frac{\phi\left(x_{i}\right)}{\prod_{j=0, j j i}^{k}\left(x_{i}-x_{j}\right)}, \quad k=1,2, \ldots, s, \end{aligned}
where
for The function is -convex if for allWe mention that the definition of
-convex function, which refers to higher-convexity, should not be confused with the one for Schur-convex function, also known as S-convex function.Definition 8. Denuit, Lefèvre, and Shaked (2000). For two rv’s and defined on is smaller than in the -convex sense, namely if for all -convex functions (provided the expectation exists).
We mention that the 1 -convex order corresponds to the usual stochastic dominance order, and the 2-convex order is the usual convex order (see Müller and Stoyan (2002), Denuit et al. (2005), and Shaked and Shanthikumar (2007) for a review on stochastic orders). Also, as stated in Theorem 1.6.3 of Müller and Stoyan (2002), the -convex order can only be used to compare rv’s with the same first moments (which explains why is chosen to be in what follows). Examples of -convex functions are for and for For a general treatment of the -convex order, see, e.g., Denuit, Lefèvre, and Mesfioui (1999) and Denuit, Lefèvre, and Shaked (2000) and section 1.6 of Müller and Stoyan (2002).
Let
and be the -extremum rv’s on i.e., those which satisfy\begin{gathered} E\left[\phi\left(K_{(m+1)-\min }\right)\right] \leq E[\phi(K)] \\ \quad \leq E\left[\phi\left(K_{(m+1)-\max }\right)\right], \end{gathered}
for any Prékopa (1990) (see also Courtois, Denuit, and Van Bellegem (2006, Section 4)) and are repeated here:
-convex function and The general distribution forms of and are given in\scriptsize{ \begin{array}{|c|c|c|} \hline & m+1 \text { even } & m+1 \text { odd } \\ \hline \text { support of } K_{(m+1)-\min } & \left\{j_1, j_1+1, \ldots, j_{\frac{m+1}{2}}, j_{\frac{m+1}{2}}+1\right\} & \left\{1, j_1, j_1+1, \ldots, j_{\frac{m}{2}}, j_{\frac{m}{2}}+1\right\} \\ \hline \text { support of } K_{(m+1)-\max } & \left\{1, j_1, j_1+1, \ldots, j_{\frac{m-1}{2}}, j_{\frac{m-1}{2}}+1, l\right\} & \left\{j_1, j_1+1, \ldots, j_{\frac{m}{2}}, j_{\frac{m}{2}}+1, l\right\} \\ \hline \end{array} \tag{10}}
where
From (10), it is clear that the support of has at most elements.Let Denuit, Lefèvre, and Utev (1999, Property 5.7) relates to the stability of the -convex order under compounding.
be a mixed Erlang rv, i.e., are a sequence of iid exponential rv’s with mean independent of The following result ofLemma 9 If
thenWe apply Lemma 9 to define the mixed Erlang rv’s
and which are the extremum rv’s on It is immediate that, forW_{K_{(m+1)-\min }} \preceq_{(m+1)-c x}^{\mathbb{R}+} W \preceq \preceq_{(m+1)-c x}^{\mathbb{R}+} W_{K_{(m+1)-\max }} . \tag{11}
For instance, using (11), the
-convex functions and yieldE\left[W_{K_{(m+1)-\min }}^{m+1+j}\right] \leq E\left[W^{m+1+j}\right] \leq E\left[W_{K_{(m+1)-\max }}^{m+1+j}\right]
and
\begin{gathered} E\left[\exp \left(c W_{K_{(m+1)-\min }}\right)\right] \leq E[\exp (c W)] \\ \leq E\left[\exp \left(c W_{K_{(m+1)-\max }}\right)\right], \end{gathered}
respectively.
4.3. Moment bounds on discrete expected stop-loss transforms
Extrema on the
-convex order yield bounds for when is -convex and ). However, this approach is not appropriate to derive bounds on TVaR and the stop-loss premium when the number of known moments is greater than 2. It is well known that two rv’s with the same mean and variance cannot be compared under the convex order.Consequently, we use an approach inspired from Courtois and Denuit (2009) (see also Hürlimann 2002) to derive bounds on TVaR and the stop-loss premium. We consider and determine lower and upper bounds for for all From the lower
bound, we define the corresponding rv on via the df
F_{K_{m-d o w n}}(k)=\left\{\begin{array}{c}1-\binom{\inf _{K \in \mathcal{D}\left(\kappa_{m}, A_{l}\right)} E\left[(K-k)_{+}\right]}{-\inf _{K \in \mathcal{D}\left(\kappa_{m}, A_{l}\right)} E\left[(K-k-1)_{+}\right]}, \\ k=1,2, \ldots, l-1, \\ 1, \\ k=l\end{array}\right.\tag{12}
for
Similarly, is defined as in (12) by replacing ‘inf’ by ‘sup’ in the definition. Given that, forE\left[\left(K_{m-\text { down }}-k\right)_{+}\right] \leq E\left[(K-k)_{+}\right] \leq E\left[\left(K_{m-\text {-up }}-k\right)_{+}\right],
for all Courtois and Denuit 2009). Note that and do not belong to but both have first moment The increasing convex order is stable under compounding, and thus
it implies that is smaller (larger) than under the increasing convex order, namely (see, e.g.,W_{K_{m-\text { doown }}} \preceq_{i c x} W_{K} \preceq_{i c x} W_{K_{m-u p}} .\tag{13}
Then, from Denuit et al. (2005, Proposition 3.4.8), it follows that
\begin{gathered} \operatorname{TVaR}_{\kappa}\left(W_{K_{m-\text { down }}}\right) \leq \operatorname{TVaR}_{\kappa}\left(W_{K}\right) \\ \leq \operatorname{TVaR}_{\kappa}\left(W_{K_{m-\text {-up }}}\right), \end{gathered}\tag{14}
for
Clearly, the rv’s and will most likely not belong toRemark 10. Note that, when neither of the two aforementioned approaches are applicable, we propose to derive approximate bounds for
with by calculating for all and choosingE[\phi(W)]_{\min (\max )}=\inf _{W \in \mathcal{M} \mathcal{E}^{e x t}\left(\mu_{m}, \boldsymbol{A}_{l}, \beta\right)} E[\phi(W)] .
Obviously,
does not necessarily lie between and However, the ‘interval’ estimate may give an idea of the variability of all solutions on4.4. Example: Portfolio of dependent risks
We consider a portfolio of Cossette, Gaillardetz, and Marceau (2002). Let be the aggregate claim amount with Conditional on a common mixture rv with pmf for are assumed to form a sequence of independent Bernoulli rv’s with for As for the 's, they are assumed to form a sequence of iid exponential rv’s of mean 1 , independent of and
dependent risks as described in the common mixture model ofIn this context, it is clear that
is a two-point mixture of a degenerate rv at 0 and a mixed Erlang rv of the form (1) with and i.e., its Laplace transform is given by\begin{aligned} E\left[e^{-t S}\right] & \equiv 1-p+p E\left[e^{-t Y}\right] \\ & =\sum_{j=1}^{\infty} a_{j}\left\{\prod_{i=1}^{n}\left(\left(r_{i}\right)^{j}+\left(1-\left(r_{i}\right)^{j}\right) \frac{1}{1+t}\right)\right\}, \quad t \geq 0, \end{aligned}
where
We perform the momentbased approximation on the rv rather thanFor illustrative purposes, we assume
and a logarithmic pmf for namely for Also, the constants are set such that the (unconditional) mean of is with and Under the above assumptions, the first five moments of are found to beUsing the approach on discrete
-convex extremal distributions, the dfs and for are:\begin{aligned} F_{W_{K_{5-\min }}}(x)= & 0.5365 H(x ; 1,1) \\ & +0.2127 H(x ; 2,1) \\ & +0.2232 H(x ; 3,1) \\ & +0.0245 H(x ; 6,1) \\ & +0.0030 H(x ; 7,1), \\ F_{W_{K_{5-\max }}}(x)= & 0.4860 H(x ; 1,1) \\ & +0.3981 H(x ; 2,1) \\ & +0.0621 H(x ; 4,1) \\ & +0.0537 H(x ; 5,1) \\ & +0.0000 H(x ; 20,1), \\ F_{W_{K_{6-\min }}}(x)= & 0.5126 H(x ; 1,1) \\ & +0.3179 H(x ; 2,1) \\ & +0.0531 H(x ; 3,1) \\ & +0.1082 H(x ; 4,1) \\ & +0.0059 H(x ; 7,1) \\ & +0.0023 H(x ; 8,1), \end{aligned}
and
\begin{aligned} F_{W_{K_{6-\max }}}(x)= & 0.5322 H(x ; 1,1) \\ & +0.2264 H(x ; 2,1) \\ & +0.2110 H(x ; 3,1) \\ & +0.0004 H(x ; 5,1) \\ & +0.0299 H(x ; 6,1) \\ & +0.0000 H(x ; 20,1) . \end{aligned}
be arv with df for It follows from Section 4.2 that lower (upper) bounds for the higher-order moments and the exponential premium principle defined as ( ) can be found from their counterparts for A few numerical values are provided in TablesAs expected, the bounds get sharper as the number of moments involved increases.
As for the second approach based on moment bounds with discrete expected stop-loss transforms, Table 7 presents the values of TVaR for with df for
As expected, the inequality (14) is verified. Also, we observe that the interval estimate of
shrinks as the number of moments matched increases.Finally, given that neither method is applicable for the VaR risk measure, we make use of the technique discussed in Remark 10. We find the numerical values provided in Table 8.
For this example, we observe that the exact values of
are within the minimal and maximal values of their corresponding risk measures among all members of forAlso, the spread between the minimal and maximal values of VaR are reduced when we go from 4 to 5 moments. An identical exercise for the TVaR risk measure resulted in the same conclusions.