1. Introduction
Parametric statistical models for insurance claims severity are continuous, right-skewed, and frequently heavy-tailed (see Klugman, Panjer, and Willmot 2012). The data sets to which such models are usually fitted contain outliers that are difficult to identify and separate from genuine data. Moreover, due to commonly used loss mitigation techniques, the random variables we observe and wish to model are affected by data truncation (due to deductibles), censoring (due to policy limits), and scaling (due to coinsurance). In the current practice, statistical inference for loss models is ablmost exclusively maximum likelihood estimation (MLE)–based, which typically results in nonrobust parameter estimators, pricing models, and risk measurements.
Construction of robust actuarial models includes many ideas from the mainstream robust statistics literature (see, e.g., Huber and Ronchetti 2009), but there are additional nuances that need to be addressed. Namely, actuaries have to deal with heavy-tailed and skewed distributions, data truncation and censoring, identification and recycling of outliers, and aggregate loss, just to name a few. The actuarial literature is home to a number of specialized studies addressing some of these issues; see, e.g., Künsch (1992), Gisler and Reinhard (1993), Brazauskas and Serfling (2003), Garrido and Pitselis (2000), Marceau and Rioux (2001), Serfling (2002), and Dornheim and Brazauskas (2007). Further, those and other actuarial studies motivated the development of two broad classes of robust estimators—the methods of trimmed moments (see, e.g., Brazauskas 2009; Brazauskas, Jones, and Zitikis 2009) and winsorized moments (see, e.g., Zhao, Brazauskas, and Ghorai 2018a, 2018b). Those two approaches, called - and -estimators for short, are sufficiently general and flexible for fitting continuous parametric models based on completely observed ground-up loss data. In Figure 1.1, we illustrate how and methods act on data and control the influence of extremes. First of all, notice that typical loss mitigation techniques employed in insurance practice (e.g., deductibles and policy limits) are closely related to data winsorizing or its variants. Second, we see that in order to taper the effects of rare but high severity claims on parameter estimates, data should be “preprocessed” using trimming or winsorizing. Thenceforth, and estimates can be found by applying the classical method of moments. Note that these initial modifications of data have to be taken into account when deriving corresponding theoretical moments. This yields an additional benefit. Specifically, unlike the parameter estimatbors based on the standard method of moments, which may not exist for heavy-tailed models (due to the nonexistence of finite moments), theoretical and moments are always finite. Finally, for trimmed or winsorized data, estimation of parameters via the method of moments is not the only option. Indeed, one might choose to apply another estimation procedure (e.g., properly constructed MLE) and gain similar robustness properties. In this paper, however, we focus on rigorous treatment of moment-type estimators.
Opdyke and Cavallo (2012), used in credibility studies by Kim and Jeon (2013), and further tested in risk measurement exercises by Abu Bakar and Nadarajah (2018). Also, the idea of trimming has been gaining popularity in modeling extremes (see Bhattacharya, Kallitsis, and Stoev 2019; Bladt, Albrecher, and Beirlant 2020). Thus we anticipate the methodology developed in this paper will be useful and transferable to all these and other areas of research.
-estimators have been discussed in the operational risk literature byMoreover, besides the typical nonrobustness of MLE-based inference, the implementation of such procedures on real data is also technically challenging (see discussions by Frees 2017; Lee 2017). This issue is especially evident when one tries to fit complicated multiparameter models such as finite mixtures of distributions (see Verbelen et al. 2015; Miljkovic and Grün 2016; Reynkens et al. 2017). Thus, the primary objective of this paper is to go beyond the complete data scenario and develop - and -estimators for insurance data affected by the above-mentioned transformations. We show that, when properly redesigned, - and -estimators can be a robust and computationally efficient alternative to MLE-based inference for claim severity models that are affected by deductibles, policy limits, and coinsurance. In particular, we provide the definitions of - and -estimators and derive their asymptotic properties such as normality and consistency. Specific formulas or estimating equations for a single-parameter Pareto (Pareto I) model are provided. Finally, we illustrate the practical performance of the estimators by fitting Pareto I to the well-known Norwegian fire claims data. We use MLE and several - and -estimators, validate the fits, and apply the fitted models to price an insurance contract.
The remainder of the paper is organized as follows. In Section 2, we describe a series of loss variable (data) transformations, starting with complete data, continuing with truncated and censored data, and finishing with two types of insurance payments. Section 3 uses the data scenarios and models of the previous section and derives
- and -estimators for the parameters of those models. Then asymptotic properties of those estimators are established. In Section 4, we develop specific formulas of the estimators when the underlying loss distribution is Pareto I, and we compare the asymptotic relative efficiency of - and -estimators with respect to MLE. Section 5 is devoted to practical applications of the Pareto I model; the effects of model fitting on insurance contract pricing are then investigated. Finally, concluding remarks are offered in Section 6.2. Data and models
In this section, we review typical transformations of continuous random variables that one might encounter in modeling claim severity. For each type of variable transformation, the resulting probability density function (PDF), cumulative distribution function (CDF), and quantile function (QF) are specified.
2.1. Complete data
Let us start with the complete data scenario. Suppose the observable random variables X1,X2,…,Xn are independent and identically distributed (i.i.d.) and have the PDF CDF and QF Because loss random variables are nonnegative, the support of is the set
The complete data scenario is not common when claim severities are recorded, but it represents what are known as “ground-up” losses and thus is important to consider. Statistical properties of the ground-up variable are of great interest in risk analysis, in product design (for specifying insurance contract parameters), in risk transfer considerations, and for other business decisions.
2.2. Truncated data
Data truncation occurs when sample observations are restricted to some interval (not necessarily finite), say X∗1,X∗2,…,X∗n, where each is equal to the ground-up loss variable if falls between and and is undefined otherwise. That is, satisfies the following conditional event relationship: X∗d=X|t1<X<t2, where denotes “equal in distribution.” Due to that relationship, the CDF PDF and QF of variables are related to and (see Section 2.1) and are given by
with Measurements and even a count of observations outside the interval are completely unknown. To formalize this discussion, we will say that we observe the i.i.d. dataF∗(x;t1,t2)=P[X≤x|t1<X<t2]={0,x≤t1;F(x)−F(t1)F(t2)−F(t1),t1<x<t2;1,x≥t2,
f∗(x;t1,t2)=ddx[F∗(x;t1,t2)]={f(x)F(t2)−F(t1),t1<x<t2;undefined,x=t1, x=t2;0,elsewhere,
F−1∗(v;t1,t2)=F−1(vF(t2)+(1−v)F(t1)),for 0≤v≤1.
In industrywide databases such as ORX Loss Data (managingrisktogether.orx.org
), only losses above some prespecified threshold, say are collected, which results in left-truncated data at Thus, the observations available to the end user can be viewed as a realization of random variables (2.2) with and The latter condition slightly simplifies formulas (2.3)–(2.5); one just needs to replace with 1.
2.3. Censored data
Several versions of data censoring occur in statistical modeling: interval censoring (includes left and right censoring depending on which endpoint of the interval is infinite), type I censoring, type II censoring, and random censoring. For actuarial work, the most relevant type is interval censoring. It occurs when complete observations are available within some interval, say X∗∗1,X∗∗2,…,X∗∗n, where each is equal to the ground-up variable if falls between and and is equal to the corresponding endpoint of the interval if is beyond that point. That is, is given by X∗∗=min{max(t1,X),t2}={t1,X≤t1;X,t1<X<t2;t2,X≥t2.
with but data outside the interval are only partially known. That is, counts are available but actual values are not. To formalize this discussion, we will say that we observe the i.i.d. dataDue to this relationship, the CDF
PDF and QF of variables are related to and and have the following expressions:F∗∗(x;t1,t2)=P[min{max(t1,X),t2}≤x]=P[X≤x]1{t1≤x<t2}+1{t2≤x}={0,x<t1;F(x),t1≤x<t2;1,x≥t2,
where
denotes the indicator function. Further,F−1∗∗(v;t1,t2)={t1,v<F(t1);F−1(v),F(t1)≤v<F(t2);t2,v≥F(t2).
Note that CDF (2.7) is a mixture of continuous CDF
and discrete probability mass at (with probability and (with probability This results in a mixed PDF/probability mass function:f∗∗(x;t1,t2)={F(t1),x=t1;f(x),t1<x<t2;1−F(t−2),x=t2;0,elsewhere.
2.4. Insurance payments
Insurance contracts have coverage modifications that need to be taken into account when modeling the underlying loss variable. Usually coverage modifications such as deductibles, policy limits, and coinsurance are introduced as loss control mechanisms so that unfavorable policyholder behavioral effects (e.g., adverse selection) can be minimized. There are also situations when certain features of the contract emerge naturally (e.g., the value of insured property in general insurance is a natural upper policy limit). Here we describe two common transformations of the loss variable along with the corresponding CDFs, PDFs, and QFs.
Suppose the insurance contract has ordinary deductible
upper policy limit and coinsurance rate These coverage parameters imply that when a loss is reported, the insurance company is responsible for a proportion of exceeding but no more thanNext, if the loss severity
below the deductible is completely unobservable (even its frequency is unknown), then the observed i.i.d. insurance payments can be viewed as realizations of left-truncated, right-censored, and linearly transformed (called the payment-per-payment variable):Y d= c(min{X,u}−d)|X>d = {undefined,X≤d;c(X−d),d<X<u;c(u−d),u≤X.
We can see that the payment variable
is a linear transformation of a composition of variables and (see Sections 2.2 and 2.3). Thus, similar to variables and its CDF PDF and QF are also related to and and are given byGY(y;c,d,u)=P[c(min{X,u}−d)≤y|X>d]={0,y≤0;F(y/c+d)−F(d)1−F(d),0<y<c(u−d);1,y≥c(u−d),
gY(y;c,d,u)={f(y/c+d)c[1−F(d)],0<y<c(u−d);1−F(u−)1−F(d),y=c(u−d);0,elsewhere,
and
G−1Y(v;c,d,u)={c[F−1(v+(1−v)F(d))−d],0≤v<F(u)−F(d)1−F(d);c(u−d),F(u)−F(d)1−F(d)≤v≤1.
The scenario that no information is available about
below is likely to occur when modeling is performed based on the data acquired from a third party (e.g., a data vendor). For payment data collected in-house, the information about the number of policies that did not report claims (equivalently, resulted in a payment of 0) would be available. This minor modification yields different payment variables, say which can be treated as i.i.d. realizations of interval-censored and linearly transformed (called the payment-per-loss variable):Z=c(min{X,u}−min{X,d})={0,X≤d;c(X−d),d<X<u;c(u−d),u≤X.
Again, its CDF
PDF and QF are related to and and given byGZ(z;c,d,u)=P[c(min{X,u}−min{X,d})≤z]={0,z<0;F(z/c+d),0≤z<c(u−d);1,z≥c(u−d),
gZ(z;c,d,u)={F(d),z=0;f(z/c+d)/c,0<z<c(u−d);1−F(u−),z=c(u−d);0,elsewhere,
and
G−1Z(v;c,d,u)={0,0≤v≤F(d);c(F−1(v)−d),F(d)<v<F(u);c(u−d),F(u)≤v≤1.
3.
- and -estimationIn this section, we first provide definitions of parameter estimators obtained by using the method of trimmed moments (MTM)
-estimators; Section 3.1) and the method of winsorized moments (MWM) -estimators; Section 3.2) under the data scenarios of Sections 2.1–2.4. Then, in Section 3.3, we specify the asymptotic distribution of the resulting estimators. Also, throughout the section we assume that the ground-up losses follow a continuous parametric distribution with PDF and CDF that are indexed by unknown parameters The goal is to estimate those parameters using - and -estimators by taking into account the probabilistic relationships between the CDF and the distribution function of observed data.3.1.
-estimators-estimators are derived by following the standard method-of-moments approach, but instead of standard moments we match sample and population trimmed moments (or their variants). The advantage of such an approach over the standard one is that the population moments always exist irrespective of the tail-heaviness of the underlying distribution. The following definition lists the formulas of sample and population moments for the data scenarios of Sections 2.1–2.4.
Definition 3.1. For data scenarios and models of Sections 2.1–2.4, let us denote the sample and population
moments as and respectively. If is an ordered realization of variables (2.1), (2.2), (2.6), (2.10), or (2.14) with QF denoted (which depending upon the data scenario equals to QF (2.5), (2.8), (2.13), or (2.17), then the sample and population moments, with the trimming proportions (lower) and (upper), have the following expressions:ˆTj=1n−mn−m∗nn−m∗n∑i=mn+1[h(wi:n)]j,j=1,…,k,
Tj(θ)=11−a−b∫1−ba[h(F−1V(v|θ))]jdv,j=1,…,k.
Under all the data scenarios, the trimming proportions
and and function are chosen by the researcher. Also, integers and are such that and when In finite samples, the integers and are computed as and where denotes the greatest integer part.Note 3.1. In the original formulation of MTM estimators for complete data (Brazauskas, Jones, and Zitikis 2009), the trimming proportions and and function were allowed to vary for different which makes the technique more flexible. On the other hand, for implementation of MTM estimators in practice, such flexibility requires one to make more decisions regarding the and interaction with each other and for different The follow-up research that used MTMs usually had not varied these constants and functions, which seems like a reasonable choice. Therefore, in this paper we choose to work with non-varying and for all
Note 3.2. For incomplete data scenarios, possible permutations between
and have to be taken into account. For truncated data, there is only one possibility: For censored data, however, it is possible to use part or all of the censored data in estimation. Thus, we can have six arrangements:Among these, the sixth case
makes the most sense because it uses the available data in the most effective way. For the sake of completeness, however, we will investigate the other cases as well (see Section 4). Note that the insurance payments and are special (mixed) cases of truncated and censored data and thus will possess similar properties. Moreover, the -estimators based on case 6 will be resistant to outliers, i.e., observations that are inconsistent with the assumed model and most likely appearing at the boundaries andNote 3.3. In view of Notes 3.1 and 3.2, the Brazauskas and Serfling (2000) and Serfling (2002).
-estimators with and are globally robust with the lower and upper breakdown points given by and respectively. The robustness of such estimators against small or large outliers comes from the fact that in the computation of estimates the influence of the order statistics with the index less than or greater than is limited. For more details on and see3.2.
-estimators-estimators are derived by following the standard method-of-moments approach, but instead of standard moments we match sample and population winsorized moments (or their variants). Similar to -estimators, the population moments also always exist. The following definition lists the formulas of sample and population moments for the data scenarios of Sections 2.1–2.4.
Definition 3.2. For data scenarios and models of Sections 2.1–2.4, let us denote the sample and population
moments as and respectively. If is an ordered realization of variables (2.1), (2.2), (2.6), (2.10), or (2.14) with QF denoted (which depending upon the data scenario equals to QF (2.5), (2.8), (2.13), or (2.17), then the sample and population moments, with the winsorizing proportions (lower) and (upper), have the following expressions:ˆWj=1n[mn[h(wmn+1:n)]j+n−m∗n∑i=mn+1[h(wi:n)]j+m∗n[h(wn−m∗n:n)]j],
Wj(θ)= a[h(F−1V(a∣θ))]j+∫1−ba[h(F−1V(v∣θ))]jdv+b[h(F−1V(1−b∣θ))]j
where
the winsorizing proportions and and function are chosen by the researcher, and the integers and are defined and computed the same way as in Definition 3.1.Note 3.4. In the original formulation of MWM estimators for complete data, Zhao, Brazauskas, and Ghorai (2018a), the winsorizing proportions and and function were allowed to vary for different Based on arguments similar to those made in Note 3.1, in this paper we will choose the same and for all Further, the focus will be on the case when and fall within the interval : Finally, the breakdown points of -estimators are identical to those of -estimators, i.e., and
3.3. Asymptotic properties
In this section, we specify the asymptotically normal distributions for the
- and -estimators of Sections 3.1–3.2. It follows immediately from the parametric structure of those asymptotic distributions that all the estimators under consideration are consistent. Throughout the section the notation is used to denote “asymptotically normal.”3.3.1. T-estimators
-estimators are found by matching sample moments (3.1) with population moments (3.2) for and then solving the system of equations with respect to The obtained solutions, which we denote by are, by definition, the -estimators of Note that the functions are such that
The asymptotic distribution of these estimators for complete data has been derived by Brazauskas, Jones, and Zitikis (2009). It also follows from a more general theorem established by Zhao et al. (2018a, Note 2.4), which relies on the central limit theory of -statistics (Chernoff, Gastwirth, and Johns 1967). The following theorem generalizes those results to all data scenarios of Sections 2.1–2.4.
Theorem 3.1. Suppose an i.i.d. realization of variables (2.1), (2.2), (2.6), (2.10), or (2.14) has been generated by CDF
which depending upon the data scenario equals to CDF (2.3), (2.7), (2.11), or (2.15), respectively. Let denote a -estimator of ThenˆθT=(ˆθ1,…,ˆθk) is AN((θ1,…,θk),1nDtΣtD′t),
where
is the Jacobian of the transformations evaluated at and is the covariance-variance matrix with the entries
σ2ij= 1(1−a−b)(1−a−b)⋅∫1−ba∫1−ba{(min{v,w}−vw)d[h(F−1V(v))]j⋅ d[h(F−1V(w))]i}.
Proof. For complete data, generated by (2.1) and with the assumption that Brazauskas, Jones, and Zitikis (2009) or Zhao et al. (2018a, Note 2.4).
is continuous, seeFor truncated data, generated by (2.2), the CDF
given by (2.3) is still continuous and hence the results established for complete data can be directly applied toFor the remaining data scenarios, generated by (2.6), (2.10), or (2.14), the QF Zhao, Brazauskas, and Ghorai 2018a). The set of such points, however, has probability zero, which means that the CDFs and are almost everywhere continuous under the Borel probability measures induced by and (see, e.g., Folland 1999, Theorem 1.16). Therefore, shall be replaced with whenever it is not defined; see Chernoff et al. (1967, Assumption A*).
is not smooth and the functions have points of non-differentiability (see Lemma A.1 inNote 3.5. Theorem 3.1 states that
-estimators for the parameters of loss models considered in this paper are asymptotically unbiased with the entries of the covariance-variance matrix diminishing at the rate Using these properties in conjunction with the multidimensional Chebyshev’s inequality it is a straightforward exercise to establish the fact that -estimators are consistent.3.3.2. W-estimators
-estimators are found by matching sample moments (3.3) with population moments (3.4) for and then solving the system of equations with respect to The obtained solutions, which we denote by are, by definition, the -estimators of Note that the functions are such that
The asymptotic distribution of these estimators for complete data has been established by Zhao et al. (2018a, Theorem 2.1 and Lemma A.1). The following theorem summarizes the asymptotic distribution of -estimators to all data scenarios of Section 2.
Theorem 3.2. Suppose an i.i.d. realization of variables (2.1), (2.2), (2.6), (2.10), or (2.14) has been generated by CDF ˆθW=(ˆθ1,…,ˆθk) is AN((θ1,…,θk),1nDwΣwD′w), where is the Jacobian of the transformations evaluated at and is the covariance-variance matrix with the entries σ2ij=ˆA(1)i,j+ˆA(2)i,j+ˆA(3)i,j+ˆA(4)i,j, where the terms are specified in Zhao et al. (2018a, Lemma A.1).
which depending upon the data scenario equals to CDF (2.3), (2.7), (2.11), or (2.15), respectively. Let denote a -estimator of ThenProof. The proof can be established by following the same arguments as in Theorem 3.1.
Note 3.6. Similar to the discussion of Note 3.5, the asymptotic normality statement of this theorem implies that
-estimators are consistent.4. Analytic examples: Pareto I
In this section, we first derive the MLE and Poudyal (2021a). Note that Pareto I is the distribution of the ground-up variable The CDF, PDF, and QF of Pareto I are defined as follows:
- and -estimators for the tail parameter of a single-parameter Pareto distribution, abbreviated as Pareto I, when the observed data are in the form of either insurance payments defined by (2.10), or defined by (2.14). The corresponding MLE and -estimators for lognormal distribution have recently been investigated byCDF:F(x)=1−(x0/x)α,x>x0,
PDF:f(x)=(α/x0)(x0/x)α+1,x>x0,
QF: F−1(v)=x0(1−v)−1/α,0≤v≤1,
where
is the shape (tail) parameter and is a known constant.Then, the definitions of the estimators are complemented with their asymptotic distributions. Using the asymptotic normality results, we evaluate the asymptotic relative efficiency (ARE) of the ARE(Q, MLE) = asymptotic variance of MLE estimatorasymptotic variance of Q estimator, where Q represents the - or -estimator. Since for Pareto I the asymptotic variance of MLE reaches the Cramér-Rao lower bound, the other estimators’ efficiency will be between 0 and 1. Estimators with AREs close to 1 are preferred.
- and -estimators with respect to the MLE:Also, for the complete data scenario, formulas of Brazauskas, Jones, and Zitikis (2009). Derivations for the other data scenarios of Section 2 (truncated and censored data) are analogous to the ones presented in this section and thus will be skipped.
and are available in4.1. MLE
4.1.1. Payments Y
is a realization of variables (2.10) with PDF (2.12) and CDF (2.11), where and are given by (4.1) and (4.2), respectively, then the log-likelihood function can be specified by following standard results presented in Klugman et al.LPY(α|y1,…,yn)=n∑i=1log[f(yi/c+d)/c]1{0<yi<c(u−d)}− nlog[1−F(d)] + log[1−F(u−)]n∑i=11{yi=c(u−d)}=n∑i=1{[log(αcx0)−(α+1)log(yi/c+dx0)]⋅1{0<yi<c(u−d)}}− αnlog(x0/d) + αlog(x0/u)n∑i=11{yi=c(u−d)},
where
denotes the indicator function. Straightforward maximization of yields an explicit formula of the MLE of :ˆαMLE=[n∑i=11{0<yi<c(u−d)}]÷[n∑i=1log(yicd+1)⋅1{0<yi<c(u−d)}+log(ud)n∑i=11{yi=c(u−d)}].
The asymptotic distribution of Serfling 1980, Section 4.2). In this case, the Fisher information matrix has a single entry:
follows from standard results for MLEs (see, e.g.,I11 = −E[∂2loggY(Y|α)∂α2] = −E[−1α21{0<Y<c(u−d)}] = 1α2[1−(d/u)α].
Hence, the estimator
defined by (4.4), has the following asymptotic distribution:ˆαMLE is AN(α,1nα21−(d/u)α).
A few observations can be made from this result. First, the coinsurance factor
has no effect on (4.5). Second, the corresponding result for the complete data scenario is obtained when there is no deductible (i.e., and no policy limit (i.e., Third, if then the asymptotic properties of remain equivalent to those of the complete data case irrespective of the choice of Also, notice that (4.4) implies that is a consistent and efficient estimator.4.1.2. Payments Z
is a realization of variables (2.14) with PDF (2.16) and CDF (2.15), where and are given by (4.1) and (4.2), respectively, then the log-likelihood function can be specified by following standard results presented in Klugman et al.LPZ(α|z1,…,zn)= log[F(d)]n∑i=11{zi=0} + log[1−F(u−)]n∑i=11{zi=c(u−d)}+n∑i=1log[f(zi/c+d)/c]⋅1{0<zi<c(u−d)}= log[1−(x0/d)α]n∑i=11{zi=0}+αlog(x0/u)n∑i=11{zi=c(u−d)}+n∑i=1[log(αcx0)−(α+1)log(zic+dx0)]⋅1{0<zi<c(u−d)}.
It is clear from the expression of
that it has to be maximized numerically. Suppose that a unique solution for the maximization of (4.6) with respect to is found, and let us denote itFurther, the asymptotic distribution of Serfling 1980, Section 4.2). In this case, the single entry of the Fisher information matrix is
follows from standard results for MLEs (see, e.g.,I11=−E[∂2loggZ(Z∣α)∂α2]=−E[−(x0/d)αlog2(x0/d)(1−(x0/d)α)21{Z=0}−1α21{0<Z<c(u−d)}]=α−2[(x0/d)α1−(x0/d)αlog2[(x0/d)α]+(x0/d)α−(x0/u)α].
Hence, the estimator
found by numerically maximizing (4.6), has the following asymptotic distribution:ˆˆαMLE isAN(α,α2n[(x0/d)α1−(x0/d)αlog2[(x0/d)α]+(x0/d)α−(x0/u)α]−1).
Here, we again emphasize several points. First, as in Section 4.1.1, the coinsurance factor
has no effect on (4.7). Second, the corresponding result for the complete data scenario is obtained when there is no deductible (to eliminate from (4.7), take the limit as and no policy limit (i.e., Third, (4.7) implies that is a consistent and efficient estimator.4.2.
-estimators4.2.1. Payments Y
Let
denote an ordered realization of variables (2.10) with QF (2.13), where and are given by (4.1) and (4.3), respectively. Since Pareto I has only one unknown parameter, we need only one moment equation to estimate it. Also, since payments are left-truncated and right-censored, it follows from Note 3.2 that only the last three permutations between the trimming proportions and are possible (i.e., cannot be less than That is, after converting and into the notation involving and we get from (3.2) the following arrangements:Case 1:
(estimation based on censored data only).
Case 2: (estimation based on observed and censored data).
Case 3: (estimation based on observed data only).
In all these cases, the sample
moments (3.1) can be easily computed by first estimating the probability as then selecting and finally choosing Note that and are known constants, and the logarithmic transformation will linearize the QF in terms of (at least for the observed data part). With these choices in mind, let us examine what happens to the population moments (3.2) under the cases 1–3. The following steps can be easily verified:(1−a−b)T1(y)(α)=∫1−bahY(G−1Y(v|α))dv = ∫1−balog(G−1Y(v|α)cd+1)dv=∫1−ba[log(1dF−1(v+(1−v)F(d)))⋅ 1{0≤v<F(u)−F(d)1−F(d)} + log(u/d)1{F(u)−F(d)1−F(d)≤v≤1}]dv={(1−a−b)log(u/d),Case 1;α−1[(1−a)(1−log(1−a))+blog(d/u)α−(d/u)α],Case 2;α−1[(1−a)(1−log(1−a))−b(1−logb)],Case 3.
It is clear from these expressions that estimation of
is impossible in Case 1 because there is no in the formula of In Case 2, has to be estimated numerically by solving the following equation:α−1[(1−a)(1−log(1−a))+blog(d/u)α−(d/u)α] = (1−a−b)ˆT1(y),
where
Suppose a unique solution of (4.8) with respect to is found. Let us denote it and remember that it is a function of say Finally, if Case 3 is chosen, we then have an explicit formula for a -estimator of :ˆα(3)T=It(a,1−b)(1−a−b)ˆT1(y) =: s(3)1(ˆT1(y)),
where It(a,1−b):=−∫1−balog(1−v)dv=(1−a)(1−log(1−a))−b(1−logb) and the sample moment is computed as before; see (4.8).
Next, we specify the asymptotic distributions and compute AREs of
and The asymptotic distributions of and follow from Theorem 3.1. In both cases, the Jacobian and the covariance-variance matrix are scalar. Denoting and the Jacobian entries for Cases 2 and 3, respectively, we get the following expressions:d(2)11=∂ˆα(2)T∂ˆT1(y)|ˆT1(y)=T1(y)=∂s(2)1(ˆT1(y))∂ˆT1(y)|ˆT1(y)=T1(y)=(1−a−b)α2(d/u)α(1−log(d/u)α)−(1−a)(1−log(1−a))= −(1−a−b)α2It(a,1−(d/u)α),d(3)11=∂ˆα(3)T∂ˆT1(y)|ˆT1(y)=T1(y)=∂s(3)1(ˆT1(y))∂ˆT1(y)|ˆT1(y)=T1(y) =−(1−a−b)α2It(a,1−b).
Note that
is found by implicitly differentiating (4.8). Further, denoting and the entries for Cases 2 and 3, respectively, we get the following expressions:(1−a−b)2σ211(2)= ∫1−ba∫1−ba(min{v,w}−vw)⋅dhY(G−1Y(v))dhY(G−1Y(w))= α−2∫1−(d/u)αa∫1−(d/u)αa(min{v,w}−vw)⋅ dlog(1−v)dlog(1−w)=: α−2Jt(a,1−(d/u)α;a,1−(d/u)α)
and
(1−a−b)2σ211(3)= ∫1−ba∫1−ba(min{v,w}−vw)⋅dhY(G−1Y(v))dhY(G−1Y(w))= α−2∫1−ba∫1−ba(min{v,w}−vw)⋅dlog(1−v)dlog(1−w)= α−2Jt(a,1−b;a,1−b).
Now, as follows from Theorem 3.1, the asymptotic variances of these two estimators of
are equal to for This implies that the estimators found by numerically solving (4.8), and given by (4.9), have the following asymptotic distributions:ˆα(2)T is AN(α,α2nJt(a,1−(d/u)α;a,1−(d/u)α)I2t(a,1−(d/u)α))
and
ˆα(3)T is AN(α,α2nJt(a,1−b;a,1−b)I2t(a,1−b)).
From (4.10) we see that the asymptotic variance of
does not depend on the upper trimming proportion where As expected, both estimators and their asymptotic distributions coincide when Thus, for all practical purposes is a better estimator (i.e., it has an explicit formula and it becomes equivalent to if one chooses therefore (more generally, Case 2) will be discarded from further consideration.As discussed in Note 3.3, the
-estimators are globally robust if and This is achieved by sacrificing the estimator’s efficiency (i.e., the more robust the estimator the larger its variance). From (4.5) and (4.11), we find that the asymptotic relative efficiency of with respect to isARE(ˆα(3)T,ˆαMLE)= α2n11−(d/u)αα2nJt(a,1−b;a,1−b)I2t(a,1−b) = I2t(a,1−b)[1−(d/u)α]Jt(a,1−b;a,1−b).
In this case the integrals Brazauskas and Kleefeld (2009) for specific approximation formulas of the bivariate integrals In Table 4.1, we present ARE computations.
and can be derived analytically, but in general it is easier and faster to approximate them numerically; see Appendix A.2 inIt is obvious from the table that for a fixed
the effect of the lower trimming proportion on the ARE is negligible. As increases, -estimators become more robust but less efficient, yet their AREs are still sufficiently high (all at least 0.67; more than half above 0.85). Also, all estimators’ efficiency improves as the proportion of right-censored data increases. Take, for example, : the -estimator’s efficiency grows from 0.857 (when to 0.943 (when4.2.2. Payments Z
Let 0≤F(d)≤a<1−b≤F(u)≤1.
denote an ordered realization of variables (2.14) with QF (2.17), where and are given by (4.1) and (4.3), respectively. Payments are left- and right-censored, and it follows from Note 3.2 that six permutations are possible between the trimming proportions and However, analysis similar to that done in Section 4.2.1 shows that two of those scenarios (estimation based on censored data only) have no in the formulas of population moments and three (estimation based on observed and censored data) are inferior to the estimation scenario based on fully observed data. (Due to space limitations we do not present those investigations here.) Thus, from now on we will focus on the following arrangement:Similar to the previous section, standard empirical estimates of
and provide guidance about the choice of and However, the function is defined differently: For Pareto I only the first moment is needed, and it is equal:(1−a−b)T1(z)(α)=∫1−bahZ(G−1Z(v|α))dv = ∫1−balog(F−1(v))dv=(1−a−b)log(x0)+α−1It(a,1−b).
Matching the
expression with yields an explicit formula for a -estimator of :ˆˆαT=It(a,1−b)(1−a−b)[ˆT1(z)−log(x0)] =: s(ˆT1(z)).
To specify the asymptotic distribution and compute AREs of