1. Introduction
In collective risk theory, the total amount of losses an insurance company incurs during a time period is modeled by a compound random variable
S=M∑i=1Xi,
where
is a discrete random variable representing the number of claims; are nonnegative independent and identically distributed (i.i.d.) claim size random variables independent of The tail probability,θ=P[S>c]
for some specified value
and the tail mean (mean excess loss), defined byτ=E[(S−c)+],
where
(S−c)+={0S≤cS−cS>c
are important risk measures of
because they are closely related to insurance/reinsurance pricing and capital requirement.Evaluation of the tail probability and the tail mean is not easy, even when the distribution of Panjer (1981), for the case when the distribution of belongs to the class. There is extensive literature on further developments related to Panjer’s recursive formula. For details, see the comprehensive book by Sundt and Vernic (2009).
and are known. One approach is to resort to recursive formulas, such as those proposed byTransform-based techniques, such as fast Fourier transform (FFT), are also widely used in calculating the distribution of aggregated claims. For an introduction, see the papers by Robertson (1992) and Wang (1998) and the references therein. Embrechts and Frei (2008) provided an excellent comparison of the recursive and FFT methods.
Simulation methods are flexible and can be handy in estimation of the tail probability/moments of compound distributions. However, they are subject to sampling errors. That is, different runs of the same simulation method will give different results.
For example, the one-sample crude estimator for
isˆθ0=I(S>c),
where
is an indicator function that takes the value of one if the argument is true and zero otherwise. is an unbiased estimator of becauseE[ˆθ0]=θ.
The variance of
isVar[ˆθ0]=θ(1−θ),
and the coefficient of variation (CoV) is
CoV(ˆθ0)=√θ(1−θ)θ=√(1−θ)θ.
Quite often, we are interested in a tail probability of
so that is close to zero. Then the coefficient of variation is huge, which makes the crude method inefficient. Notice that when the simulation is conducted times, the estimator for isˆθ0,n=1nn∑j=1I(S(j)>c),
which has variance
and CoVCoV(ˆθ0,n)=√θ(1−θ)/nθ=1√n⋅√(1−θ)θ≃1√nˆθ0,n=(n∑j=1I(S(j)>c))−1/2.
When
is large, the distribution of is approximately normal by the central limit theorem, based on which the confidence interval of is given by(ˆθ0,n−z1−α/2√Var[ˆθ0,n],ˆθ0,n+z1−α/2√Var[ˆθ0,n]).
In other words,
P(|ˆθ0,n−θ|θ≤z1−α/2CoV(ˆθ0,n))=1−α.
Taking
this means that with probability, the relative error of is less than Therefore, can be a measure for relative error.In fact, for any estimator
of some parameter similar reasoning applies. As a result, is a measure of relative error for Therefore, we say that an estimator (simulation method) with a smaller CoV is more efficient.For our case, the relative error of
is given in equation (1.3). When is close to zero, rarely equals one. So is small, resulting in a large CoV. For example, suppose that the value of is such that the tail probability is roughly (from prior knowledge) If we expect that is around Consequently, is approximately This means that with a probability of the relative error of is within which is not very satisfactory. Increasing the sample size to the relative error of is which is still not very accurate. Recalling that we see that in order to decrease the relative error (CoV) of an estimator tenfold, the sample size has to increase a hundredfold.Remark 1.1. This discussion may remind actuaries of the situation in limited fluctuation credibility theory, in which the sample size of loss data is determined such that the sample mean will provide a credible (low relative error) estimate of the population mean. The idea there is similar and the same square rule applies. In limited fluctuation credibility, to double the credibility factor, the sample size has to be quadrupled.
In this paper, we study simulation methods (estimators) of the tail probability
of compound random variables. As shown, increasing the sample size of the crude estimator is one way to reduce relative error. However, because of the square rule, it is costly and inefficient and sometimes not feasible. Therefore, instead of increasing the sample size, we propose several more efficient estimators of that, given the same sample size, are still unbiased, yet have a lower CoV (relative error).Reducing the CoV requires reducing the variance of the estimator. In the simulation literature, various variance reduction methods exist. Commonly used techniques include importance sampling, stratified sampling, the conditioning method, and the control variates method, which we review briefly in the next section. For detailed introductions to variance reduction methods, see, for example, the books by Ross (2012) and Asmussen and Glynn (2007).
Needless to say, simulation methods for evaluating the tail probability and tail mean of compound variables are very important for actuaries. However, quite surprisingly, to our knowledge, the literature in this area is very thin. In addition to some examples from Ross (2012), one of the most relevant references is Peköz and Ross (2004), in which the conditioning method was introduced specifically for compound variables. Blanchet and Li (2011) introduced an efficient importance sampling algorithm for estimating the tail distribution of heavy-tailed compound sums. Glasserman, Heidelberger, and Shahabuddin (2000) proposed a method for simulating tail probability (value at risk) of the returns of investment portfolios by combining the importance sampling and stratified sampling methods.
In this paper, we first briefly review some commonly used general variance reduction methods for simulation, then we propose several novel combinations of variance reduction methods specifically for compound distributions. This includes the combination of importance sampling and the conditioning method and the combination of importance sampling and stratified sampling.
Secondly, we extend our methods to simulate the tail probability and tail mean of bivariate compound variables, defined by
(S1,S2)=(M∑i=1Xi,N∑j=1Yj),
where
is a vector of (dependent) random variables representing the number of the claims in two lines of business. The claim size random variables and for are mutually independent and are independent of the claim numbers andThe remaining parts of the paper are organized as follows. Section 2 reviews commonly used variance reduction methods. Section 3 applies them in estimation of tail probability. Section 4 studies the simulation methods for the tail mean. Sections 5 and 6 extend the results to bivariate compound variables. Section 7 concludes.
2. Review of variance reduction methods
We start by briefly reviewing several variance reduction methods that will be used in this paper. For more comprehensive introductions to the topic, see books such as those by Ross (2012) and Asmussen and Glynn (2007).
2.1. Importance sampling
Let
denote a vector of random variables having a joint density function and suppose that we want to estimateθ=E[g(Z)]=∫g(z)f(z)dz,
where the integral is
-dimensional and over the support ofImportance sampling finds the probability density function (PDF)
such that whenever Sinceθ=∫g(z)f(z)f∗(z)f∗(z)dz=E[g(Z∗)f(Z∗)f∗(Z∗)],
where
has densityˆθI=g(Z∗)f(Z∗)f∗(Z∗)
is an unbiased estimator for
The importance sampling approach aims to choose an appropriate
so that has a smaller variance compared with the crude estimatorSince
Var[g(Z∗)f(Z∗)f∗(Z∗)]=E[(g(Z∗)f(Z∗)f∗(Z∗)−θ)2]=∫(g(z)f(z)−θf∗(z))2f∗(z)dz,
in order to achieve a smaller variance,
should be chosen such that the numerator is close to zero. That is, is proportional toIf
has a finite moment generating function (MGF)MZ(t)=E[eZ⋅t]=E[eZ1t1+⋯+Zntn],
for some
then it is usually handy to choose to be the Esscher transform (exponential tilting) of That is, we let have PDFf∗(z)=eh⋅zf(z)MZ(h)
for some tilting parameter
In addition, the MGF of is given byMZ∗(t)=E[et⋅Z∗]=E[e(t+h)⋅Z]E[eh⋅Z]=MZ(t+h)MZ(h).
Notice that if
is a discrete random variable, we interpret and as probability mass functions (PMFs).For presentation simplicity, in the sequel, we assume that the goal is to estimate the mean of some random variable
θ=E[Z],
instead of
for some function Doing so does not lead to loss of generality because can have any form, such as2.2. The conditioning method
Suppose that
is a random variable that is correlated with and can be simulated efficiently and that can be evaluated. ThenˆθCD=E[Z∣W]
is a more efficient estimator of
than the crude estimator because it is unbiased; that is,E[ˆθCD]=E[E[Z∣W]]=E[Z],
and it has a smaller variance, as shown in the following line.
Var[Z]=Var[E[Z∣W]]+E[Var[Z∣W]]≥Var[E[Z∣W]].
2.3. Stratified sampling
Stratified sampling resembles the conditioning method in the sense that a random variable
can help simulate the mean of a random variable Suppose that takes values in strata with probability for ThenE[Z]=k∑i=1E[Z∣W∈Wi]pi.
Suppose that
is the total number of samples and is the number of samples from stratum Then the stratified estimate of isˆθS,n=k∑i=1ˉZipi,
where for
is the sample average of s conditional onLet
then the variance of isVar[ˆθS,n]=k∑i=1p2iσ2ini.
If we choose
then must have a smaller variance than the crude simulation estimator becauseVar[ˆθS,n]=1nk∑i=1piσ2i=1nE[Var[Z∣W]]≤1nVar[Z]=Var[ˆθ0,n].
It is worthwhile to note that (Fishman 1995), then is minimized.
is not necessarily the optimal number of simulations in stratum Particularly, if is chosen to be proportional to2.4. The control variates method
When using a control variable to reduce the variance of the estimator of
we select a control variate that is strongly positively or negatively correlated with Then an unbiased estimator for isˆθCV=Z−γ(W−E[W])
for some constant
Then we haveVar[ˆθCV]=Var[Z]+γ2Var[W]−2γCov(Z,W),
which is minimized if
Note that the parameter can be estimated using the simulated values of and In fact, it is the least square estimate of the slope of the simple linear regressionZ=γ0+γ1W+ϵ.
Obviously, there are many other general-purpose simulation variance reduction methods in the literature. We have only introduced the four that we use for simulating the compound random variables in this paper. In the following sections, we illustrate how these methods can be used in combination to reduce the simulation variance even further.
3. Simulating tail probability
In this section, we apply the four variance reduction methods to the estimation of the tail probability
where the compound random variable is defined in (1.1).The one-sample crude estimator is simply
The -sample estimator is given by defined in (1.2).The problem with the crude method is that when
is large, the sample size needs to be very large in order for to occur. Therefore, as shown in equation (1.3), the crude estimator is not efficient.3.1. Importance sampling
Let [1] We assume that has finite MGF on the interval where
be the PDF ofIn importance sampling, instead of sampling
we sample where has PDF Since we assume that has a finite MGF, we may choose to be the Esscher transform (exponential tilted transform) of That is, we let have PDFf∗(x)=ehxf(x)MS(h),
for some tilting parameter θ=E[I(S>c)]=E[I(S∗>c)MS(h)ehS∗].
Then we haveHence, the importance sampling estimator of
is given byˆθI=I(S∗>c)MS(h)exp(−hS∗).
If the sample size is
then we haveˆθI,n=MS(h)nn∑j=1I(S∗(j)>c)exp(−hS∗(j)), where is the th simulated value of
In order to apply the importance sampling method with exponential tilting, we need to (1) select an appropriate value for the tilting parameter
and (2) simulate with the tilted distribution.As mentioned in Section 1, the problem with the crude simulation method is that when
is small, is zero for most rounds of simulation, which results in a large value of the CoV of In importance sampling, we choose such that is more likely to occur than An intuitively natural choice for the value of is This choice is indeed optimal, as the following note shows.Note 3.1. As pointed out by Ross (2012),
ˆθI=I(S∗>c)MS(h)e−hS∗≤MS(h)e−hc.
Thus, one can stabilize the results of every round of simulation by choosing the value of
to minimize the upper bound Setting the derivative of it to zero, we getM′S(h)−cMS(h)=0.
Thus, the optimal
should satisfyc=M′S(h)MS(h)=E[SehSMS(h)]=E[S∗].
We next study the distribution of
(in order to simulate it). The result is stated in the following note.Note 3.2 The Esscher transform of
with parameter isS∗=M∗∑i=1X∗i,
where
is the Esscher transform of with parameter and is the Esscher transform of with parameterProof. Firstly, the MGF of
isMS∗(t)=MS(t+h)MS(h)=PM(MX(t+h))PM(MX(h))=PM(MX(h)MX∗(t))PM(MX(h)),
where
represents the probability generating function (PGF) of and is the Esscher transform of with parameterLet
and let be the Esscher transform of with parameter Then the PGF of isPM∗(z)=E[ec(h)M⋅zM]E[ec(h)M]=PM(ec(h)⋅z)PM(ec(h))=PM(MX(h)z)PM(MX(h)).
Comparing (3.2) and (3.3) leads to
MS∗(t)=PM∗(MX∗(t)),
which shows that
◻Note 3.2 suggests that
is a compound sum of and Therefore, it can be easily simulated if the distribution of latter two are known. This is actually the case for many commonly used distributions; the following are Esscher transforms (with parameter of some commonly used distributions.-
If pM(k)=(nk)pk(1−p)n−k,k=0,1,2,…,n, then
with the PMF -
If pM(k)=(k+r−1k)(1−p)rpk,k=0,1,2,…, then
with the PMF -
If pM(k)=λkk!e−λ,k=0,1,2,…, then
with the PMF -
If fX(x)=βαxα−1e−βx/Γ(α),x>0, then
with the PDF
Example 3.1. Assume that
and have common distribution ThenS∗=M∗∑i=1X∗i,
where
and3.2. Combining importance sampling and stratified sampling
A method for applying stratified sampling in simulating quantities related to compound distribution was introduced by Ross (2012, Section 9.5.3). This is done by treating the claim number as a stratifying variable. Specifically, in order to simulate
θ=E[gM(X1,…,XM)],
we choose a number
such that is small and make use of the fact thatθ=m∑n=0E[gn(X1,…,Xn)]pn+E[gM(X1,…,XM)|M>m](1−m∑n=0pm).
This method can be applied to estimate the tail probability
if we let, forgm(X1,…,Xm)=I(X1+⋯+Xm>c).
However, directly applying the method is not efficient because when
is large and is small, the function is likely to be zero on most strata, and more so with small values ofTherefore, we propose here a method to combine importance sampling and stratified sampling. For this purpose, we let
be the Esscher transform of with parameter andDefine
g∗m(x1,…,xm)=MS(h)I(m∑i=1xi>c)exp(−hm∑i=1xi)
and
p∗m=P[M∗=m],m=1,2,….
Then we have, for a given value of
such that is small,θ=E[I(S>c)]=E[I(S∗>c)MS(h)ehS∗]=E[g∗M∗(X∗1,…,X∗M∗)]=ml∑m=0E[g∗M∗(X∗1,…,X∗M∗)∣M∗=m]p∗m+E[g∗M∗(X∗1,…,X∗M∗)∣M∗>ml]P[M∗>ml]=ml∑m=0E[g∗m(X∗1,…,X∗m)]p∗m+E[g∗M∗(X∗1,…,X∗M∗)∣M∗>ml](1−ml∑m=0p∗m).
With this, we can follow the procedure from Ross (2012) to simulate Firstly, we generate a value of conditional on it exceeding Suppose the generated value is then we generate independent random variables Then the estimator of from this run is
E=ml∑m=1g∗m(X∗1,…,X∗m)p∗m+g∗m′(X∗1,…,X∗m′)(1−ml∑m=0p∗m).
As pointed out by Ross (2012), since it is easy to compute the functions we can use the data in the reverse order to obtain a second estimator and then average the two estimators. That is, we let
E′=ml∑m=1g∗m(X∗m′,…,X∗m′−m+1)p∗m+g∗m′(X∗m′,…,X∗1)(1−ml∑m=0p∗m),
then use
ˆθI+S=12(E+E′)
as the estimator for
Remark 3.1. By equation (3.4), the second term in equation (3.5)
g∗m′(X∗1,…,X∗m′)(1−ml∑m=0p∗m)≤MS(h)P[M∗>ml].
Thus if we select a value for
sufficiently large that is negligible, it can be omitted.3.3. The conditioning method
Peköz and Ross (2004) introduced a very effective way of simulating the tail probability of compound distribution based on the conditioning method. Let
T(c)=min(m:m∑i=1Xi>c).
Then, we have
S>c⇔M≥T(c).
Equation (3.7) agrees with proposition 1.1 of Lin (1996).
Therefore,
E[I(S>c)]=E[E[I(S>c)∣T(c)]]=E[E[I(M≥T(c))∣T(c)]].
Thus, an estimator for
applying the conditioning method is given byˆθCD=E[I(M≥T(c))∣T(c)]=P[M≥T(c)∣T(c)]=P[M≥T(c)],
where the last line is because
and are independent.To implement the method, we generate samples of
in sequence until the sum of generated values exceeds If the generated value of is then is the estimate of for this run of simulation.3.4. Combining the conditioning method and the control variates method
As shown by Peköz and Ross (2004), the estimator can be further improved by using a control variate. The idea is to select a control variate that is strongly positively or negatively correlated with Peköz and Ross (2004) suggested that one such choice is
W=T(c)∑i=1(Xi−E[X]),
which has a mean of zero.
is positively correlated with because when is large, (1) is small, and (2) s are likely to be small so that will be small.
With this choice, when combining the conditioning method with the control variates method, the estimator for
isˆθCD+CV=ˆθCD−γW,
where
which may be estimated using the simulated values of and3.5. Combining importance sampling and the conditioning method
In the conditioning method, we need to simulate
defined in equation (3.6), which may be time consuming if is large relative to the s. In this section, we introduce a method to improve this. The key is to replace withT∗(c)=min(m:m∑i=1X∗i>c),
where
is the Esscher transform of with tilting parameter This way, we combine the conditioning and importance sampling methods. The following estimator of is obtained.ˆθI+CD=P[M≥T∗(c)]MX(h)T∗(c)eh∑T∗(c)i=1X∗i.
The unbiasedness of this estimator is shown in Section A.1 of the Appendix.
To carry out the simulation, we generate samples of
in sequence until the sum of generated values exceeds The values of and are stored and used in (3.8).Remark 3.2. A control variate
W∗=T∗(c)∑i=1(X∗i−E[X∗])
can be used to improve the estimator
This results inˆθI+CD+CV=ˆθI+CD−γW∗,
where
3.6. Numerical experiments
In this section, we compare the different estimators of
introduced in Section 3 through a numerical example. We assume that and for Note that the distributional assumptions are entirely hypothetical and only for illustration purposes. Naturally, in actual application, the aggregate loss model and the parameters need to be estimated from data.For each method, the tail probabilities at
are estimated using 1000 simulated samples (one round). The standard deviation (SD) of each estimator is calculated based on 100 rounds of simulations. We will follow this convention in all the numerical examples in the rest of the paper. In addition, for estimator 3, in which stratified sampling is applied, we setThe simulation results are summarized in Table 3.1. A visual comparison of the CoV of different estimators is given in Figure 3.1.
To provide a reference point for comparing the estimators, we calculate the analytical result of the tail probability by
P(S>c)=∞∑m=1P(M=m)P(X1+⋯+Xm>c).
This formula is calculable for this simple example because when
Thus, the probabilities for any can be computed easily. For the infinite sum, we can truncate it at some value such that is small enough. In the example, and we select so that is smaller than This means that our analytical result is accurate at least up to decimal points.In addition, for comparison, we provide the normal approximation of the tail probability, which is calculated by
P(S>c)≃1−Φ(c−E[S]√Var(S)),
where
stands for the normal cumulative distribution function. We can see that normal approximation is not very accurate even for this simple example, especially when is large.As explained previously, estimators with smaller CoVs have smaller relative errors and, therefore, are better. Table 3.1 indicates that the combinations of importance sampling and stratified sampling (I+S); the conditioning method and the control variates method (CD+CV); importance sampling and the conditioning method (I+CD); and importance sampling, the conditioning method, and the control variates method (I+CD+CV) performed well. Methods involving importance sampling tend to have a small CoV when is large. Therefore, they are recommended for simulating small tail probabilities.
4. Simulating mean excess loss
In this section, we introduce variance reduction methods for simulating the mean excess losses
τ=E[(S−c)+].
4.1. Combining importance sampling and stratified sampling
The importance sampling method for simulating
is similar to that for : we simply replace with in equation (3.1). This yieldsˆτI=(S∗−c)+MS(h)exp(−hS∗).
To combine importance and stratified sampling, we replace the function
in (3.4) withg∗m(x1,…,xm)=(m∑i=1xi−c)+MS(h)e−h∑mi=1xi.
4.2. The conditioning method
The following method for simulating the mean excess loss using the conditioning method was discussed by Peköz and Ross (2004). Define
A=T(c)∑i=1Xi−c.
Then the conditional expectation estimator is constructed as
ˆτCD=E[(S−c)+∣T(c),A]=∑i≥T(c)(A+(i−T(c))E[X])P[M=i]=(A−T(c)E[X])P[M≥T(c)]+E[X]E[MI(M≥T(c))]=(A−T(c)E[X])P[M≥T(c)]+E[X](E[M]−∑i<T(c)iP[M=i])
The steps for implementing this simulation method are similar to those in Section 3.3; the only difference is that the value of
must be recorded in addition to4.3. Combining the conditioning method and the control variates method
The following two control variates can be used to enhance the performance of the conditioning method without increasing computational efforts:
W1=T(c)∑i=1(Xi−E[X])
and
W2=A−E[A].
This yields the estimator
ˆτCD+CV=ˆτCD−γ1W1−γ2W2,
where
and are chosen to minimize the variance of This could be achieved by setting the values of and to the least square estimate of the corresponding coefficients of the linear regressionˆτCD=γ0+γ1W1+γ2W2+ϵ,
where the values of
and are generated in simulation using the conditioning method.For general discussions on least square or regression-based methods for using control variates, see the papers by Lavenberg and Welch (1981) and Davidson and MacKinnon (1992).
4.4. Combining importance sampling and the conditioning method
Similar to Section 3.5, importance sampling and the conditioning method can be combined. This results in the estimator
ˆτI+CD=[(A∗−T∗(c)E[X])P[M≥T∗(c)]+E[X]E[MI(M≥T∗(c))]]MX(h)T∗(c)eh∑T∗(c)i=1X∗i
where
A∗=T∗(c)∑i=1X∗i−c.
The proof of the unbiasedness of this estimator is given in Section A.2 of the Appendix.
Remark 4.1. The quantity Denuit 2020). We have
is related to the size-biased transform (seeE[MI(M≥T∗(c))]=E[M]P[˜M≥T∗(c)],
where
is the size-biased transform of with distribution functionP[˜M=k]=kP[M=k]E[M],k=0,1,….
Equation (4.1) can be calculated efficiently because the distribution of Ren (2021), if belongs to the class with parameter then is in the class with parameter Particularly, if follows the Poisson distribution with mean then also follows the Poisson distribution with mean Therefore, for this case, (4.1) becomes
and are often related. For example, as shown byE[MI(M≥T∗(c))]=λP[M≥T∗(c)−1],
which is straightforward to evaluate.
Remark 4.2. Control variates can be utilized to improve the results further. For example, let
W∗1=T∗(c)∑i=1(X∗i−E[X∗])
and
W∗2=A∗−E[A∗].
Then the estimator is
ˆτI+CD+CV=ˆτI+CD−γ1W∗1−γ2W∗2,
where the values of
and can be estimated by running the linear regressionˆτI+CD=γ0+γ1W∗1+γ2W∗2+ϵ.
4.2. Numerical experiments
In this section, we compare the different methods for simulating Table 4.1 and Figure 4.1.
with the example set forth in Section 3.6. The results are shown inIn order to provide a reference point for the comparison, we list in the table values of
estimated using the importance sampling (I) method, but with a huge sample size of These values are proxies for “analytical value” and labeled as “Target” in the table.For this example, we did not provide the results for normal approximation of
because the approximation would not be accurate. The reason is that the normal approximation results for the tail probabilities are already inaccurate.Table 4.1 and Figure 4.1 show that the combinations I+S, I+CD, and I+CD+CV have small CoVs and thus are efficient for estimating mean excess losses. In particular, the I+CD+CV method has by far the smallest CoV.
Remark 4.3. When carrying out the importance sampling methods, we have set the value of the tilting parameter to be the same as that for estimating the tail probability. That is,
is such that However, this may not be the optimal choice.For example, we have used Figure 4.2 plots the SD of the estimator (based on repeating each method 100 times) against the values of The figure shows that compared with the tail probability case, it may be preferable to set to a greater value when estimating mean excess losses. Determining theoretical results for the optimal value of tilting parameter would be a great topic for future research.
for the case of in the previous example. To explore the optimal value of we experimented, and5. Simulating tail probability: The two-dimensional case
In this section, we study methods for simulating the tail probability
θ=P[S1>c,S2>d]
for the two-dimensional compound variable
defined in equation (1.4). The crude estimator is simplyˆθ0=I(S1>c,S2>d).
As in the one-dimensional case, this crude estimator has a large CoV and is inefficient. Therefore, we introduce several variance reduction methods to improve it.
5.1. Importance sampling
When using importance sampling to simulate
instead of sampling we sampleˆθI=I(S∗1>c,S∗2>d)f(S∗1,S∗2)f∗(S∗1,S∗2),
where
has a joint PDF We assume that has finite moment generating function and choose to be the Esscher transform of That is,f∗(x1,x2)=f(x1,x2)eh1x1+h2x2MS1,S2(h1,h2),
where
is the joint moment generating function of With this, (5.1) becomesˆθI=I(S∗1>c,S∗2>d)MS1,S2(h1,h2)eh1S∗1+h2S∗2.
As in Section 3.1, we need to determine the value of the tilting parameters
and a method to simulate in order to apply importance sampling.Note 5.1. Similar to the one-dimensional case, we have
ˆθI=I(S∗1>c,S∗2>d)MS1,S2(h1,h2)e−h1S∗1−h2S∗2≤MS1,S2(h1,h2)e−h1c−h2d.
Then, a good choice for
is to minimize the upper bound. To this end, taking the derivative of the upper bound on the right-hand side of the above equation with respect to and and setting to we havec=∂∂h1M(S1,S2)(h1,h2)M(S1,S2)(h1,h2)=E[S1eh1S1+h2S2]E[eh1S1+h2S2]=E[S∗1],
and
d=E[S∗2].
The values of
and can be determined from (5.3) and (5.4).The next note provides a representation of the distribution of
The proof of this statement is given in Section A.3 of the Appendix.Note 5.2. The Esscher transform of
with parameter is(S∗1,S∗2)=(M∗∑i=1X∗i,N∗∑j=1Y∗j),
where
and are the Esscher transforms of and with parameters and respectively, and is the Esscher transform of with parameterIn many important special cases, the distributions of
and have similar structure, as the following examples show.Example 5.1. Let
Conditional on assume claim frequencies and Claim sizes are i.i.d. and follow a gamma distribution with parameters and are i.i.d. and follow a gamma distribution with parameters With this setup, we haveMS1,S2(t1,t2)=E[et1S1+t2S2]=EΛ[E[et1S1+t2S2∣Λ]]=EΛ[PM∣Λ(MX(t1))⋅PN∣Λ(MY(t2))]=EΛ[eλ1Λ(MX(t1)−1)+λ2Λ(MY(t2)−1)]=MΛ(λ1(MX(t1)−1)+λ2(MY(t2)−1))=(1−λ1MX(t1)+λ2MY(t2)−λ1−λ2β)−α.
Therefore,
MS∗1,S∗2(t1,t2)=MS1,S2(t1+h1,t2+h2)MS1,S2(h1,h2)=(1−λ1MX(t1+h1)+λ2MY(t2+h2)−λ1MX(h1)−λ2MY(h2)β+λ1+λ2−λ1MX(h1)−λ2MY(h2))−α=(1−λ1MX(h1)(MX(t1+h1)MX(h1)−1)+λ2MY(h2)(MY(t2+h2)MY(h2)−1)β+λ1+λ2−λ1MX(h1)−λ2MY(h2))−α,
which implies that
has the representation(S∗1,S∗2)=(M∗∑i=1X∗i,N∗∑j=1Y∗j),
where
and, conditional on For the claim sizes, andThis result means that
is still a bivariate compound Poisson with common mixture and can easily be simulated.Example 5.2. Suppose that the claim frequencies
and have an additive common shock. That is, and where and are independent. Let and then(S∗1,S∗2)=(M∗∑i=1X∗i,N∗∑j=1Y∗j),
where
and and where and are the Esscher transforms of and with parameters and respectively. In addition, and are the Esscher transforms of and with parameters andThis statement can be proved as follows.
Firstly,
MS∗1,S∗2(t1,t2)=PM,N(eaMX∗(t1),ebMY∗(t2))PM,N(ea,eb)=PM1(eaMX∗(t1))PM1(ea)PM2(ebMY∗(t2))PM2(