# 1. Introduction

Survival analysis (also known as reliability theory, duration analysis, event history analysis or duration modeling) is a familiar topic for actuaries. One of the main notions of survival analysis is the hazard rate function hX(x) of a continuous random variable X defined as

hX(x):=fX(x)SX(x).

Here fX(x) is the probability density of X and SX(x):=P(X>x) is the survival function (which is equal to 1−FX(x) where FX(x):=P(X≤x) is the cumulative distribution function). The hazard rate, which is also referred to as the force of mortality, the intensity rate, the failure rate or the mortality of claims, quantifies the trajectory of imminent risk and, similarly to the probability density or the survival function, is the characteristic of a random variable. A discussion of the hazard rate can be found in actuarial texts Bowers et al. (1997), Dickson, Hardy, and Waters (2009), Cunningham, Herzoc, and London (2012) and Klugman, Panjer, and Willmot (2012). Further, there are available actuarial handbooks and softwares that contain information about more frequently used parametric hazard rates, see Richards (2011), Nadarajah and Bakar (2013), Charpentier (2015) and R package “ActuDistns”, see also the monograph Rinne (2014) which is solely devoted to the hazard rates.

Interest in hazard rates is the specific of survival analysis which differentiates it from the classical probability theory that traditionally characterizes a continuous random variable via its probability density. Another specific, which differentiates survival analysis from statistics, is that survival observations of X are typically modified by truncation and/or censoring with main cases being the left truncation (LT) and right censoring (RC), see a discussion in Klugman, Panjer, and Willmot (2012), Frees, Derrig, and Meyers (2014), Roninson (2014) and Albrecher, Beirlant, and Teugels (2017).

For the following discussion it is convenient to recall two classical examples of LTRC data. The first one is “Home-Insurance” example when a home insurance policy has an ordinary deductible T∗ and a policy limit on payment C∗, the available information is the payment on an insurable loss, and the random variable of interest X∗ is the insurable loss. In a classical statistical setting one would observe a direct sample X∗1,…,X∗n from X∗ and then use it to estimate either the survival function SX∗(x) or the probability density fX∗(x), see the book Efromovich (1999). For the “Home-Insurance” example, we get information only about losses that exceed the deductible (and this creates the left truncation) and even for those we only know the minimum of the loss and the limit (and this creates the right censoring). This example of the LTRC data is so simple and well understood, that it is used in the SAS software manual (recall that the iconic SAS software is primary created for biological and medical applications). Another specific of the “Home-Insurance” example is that the deductible T∗ is always smaller than the limit C∗, and this may not be the case in other applications. So let us recall another classical “Surgery” example (it will be complemented shortly by casualty examples) when patients, who had a cancer surgery in the past, are checked during a study, that begins at some specific time (so-called baseline) and has a fixed duration, with the aim to evaluate the distribution of a time to the cancer relapse. In this case (compare with the “Home-Insurance” example) X∗ is the time from the surgery to cancer relapse, truncation T∗ is the time from surgery to the baseline (beginning of the study), and censoring time C∗ is the smallest among times from surgery to the end of the study or until a patient is no longer able or willing to participate in the study. Note that in the “Surgery” censoring may occur before truncation, for instance, moving from the area of the study or death from a reason other than cancer may occur before the baseline. Another important difference between the two examples is that data in the “Home-Insurance” example are collected via passive observations, while in the “Surgery” example observations are collected via a controlled experiment with a medical examination of participants at the baseline. In particular the latter implies that a participant with X∗≥T∗ is included in the study (not truncated by T∗). As a result, in survival analysis literature it is traditionally assumed that X∗ is truncated by T∗ only if X∗<T∗, and this approach is used in the paper. Now recall that in the “Home-Insurance” example truncation occurs if X∗≤T∗, and this is the definition of LT used, for instance, in Klugman, Panjer, and Willmot (2012). The difference in the definitions of LT may be critical for small samples of discrete variables, but in the paper we are dealing with continuous lifetimes when P(T∗=X∗)=0. More discussion of the LTRC and different statistical models may be found in Klein and Moeschberger (2003) and Gill (2006).

In addition to a number of classical casualty insurance survival examples like fires, magnitudes of earthquakes or losses due to uninsured motorists, discussed in the above-mentioned classical actuarial books, let us mention several others that have gained interest in the literature more recently. The insurance attrition is discussed in Fu and Wang (2014). Albrecher, Beirlant, and Teugels (2017) and Reynkens et al. (2017) explore a number of survival analysis examples arising in non-life reinsurance, in particular examples with lifetimes of insurance claims. Survival analysis of the credit risk of a portfolio of consumer loans is another hot topic when both banks and insurers are required to develop models for the probability of default on loans, see a discussion in Andreeva (2006), Malik and Thomas (2010), Stepanova and Thomas (2002) and Bonino and Caivano (2012). A comprehensive discussion of the longevity of customer relations with an insurance company may be found in Martin (2005). Survival analysis of foster care reentry is another interesting example, see Goering and Shaw (2017). Egger, Radulescu, and Rees (2015) and Yuan, Sun, and Cao (2016) discuss the problem of directors and officers liability insurance. Survival analysis of the lifetime of motor insurance companies in South Africa is presented in Abbot (2015), while Lawless, Hu, and Cao (1995) analyze auto-warranty data. There is also a vast literature devoted to the mortality of enterprises and litigation risks related to IPOs (Initial Public Offerings), see a discussion in Daepp et al. (2015), Håkanson and Kappen (2016) and Xia et al. (2016). Note that IPO examples are similar to the above-discussed “Surgery” example. Indeed, in an IPO example time X∗ from the onset of the IPO to its bankruptcy is the lifetime of interest, truncation T∗ is the time from the IPO’s onset to the baseline of the study, while censoring C∗ is the smallest among times from the onset to another reason of the IPO’s death like merger, acquisition, privatization, etc. or the end of the study. Note that C∗ may be smaller than T∗ , and this is why the example resembles the “Surgery”. Finally, let us mention the problem of manufacturer warranties, see an actuarial discussion in Hayne (2007) and Walker and Cederburg (2013). A particular example with centrifuges will be discussed in Section 5.

Now let us explain the main motivation of the paper. For the case of direct observations X∗1,…,X∗n, the empirical survival function (esf)

˜SX∗(x):=n−1n∑l=1I(X∗l>x)

is the main tool for estimation of the survival function. Here and in what follows I(⋅) denotes the indicator function. Note that (1.2) is the sample mean estimator because SX∗(x):=E{I(X∗>x)}, and hence the esf is a nonparametric (no underlying model is assumed) estimator and it is unbiased because E{˜SX∗(x)}=SX∗(x). Further, because the esf is the sum of independent and identically distributed indicators, its variance is V(SX∗(x)) =n−1SX∗(x)(1−SX∗(x)), and to realize this note that in (1.2) we are dealing with the sum of independent Bernoilli variables. Further, inspired by the sample mean esf, it is possible to propose a density estimator ˆfX∗(x) motivated by the sample mean estimation, see Efromovich (1999, 2010, 2018).

The situation changes rather dramatically for the case of survival data. Kaplan and Meier (1958), for the case of a right censored sample (V1,Δ1),…,(Vn,Δn) from the pair (V,Δ):=(min(X∗,C∗),I(X∗≤C∗)), proposed the following product-limit (Kaplan–Meier) estimator,

ˇSX∗(x):=1,x<V(1);ˇSX∗(x):=0,x>V(n);ˇSX∗(x):=l−1∏i=1[(n−i)/(n−i+1)]Δ(i),V(l−1)<x≤V(l).

Here (V(l),Δ(l)), l=1,2,…,n are ordered pairs according to Vl, that is V(1)≤V(2)≤…≤V(n). A modification of (1.3) for the case of LTRC data may be found in the above-mentioned texts, see for instance Klugman, Panjer, and Willmot (2012). While the texts present a number of really good explanations of the product-limit methodology, product-limit estimators are difficult for statistical inference. Indeed, for instance in (1.3) we are dealing with the product of dependent and not identically distributed random factors, and while one can take a (negative) logarithm to convert it into a sum (and the sum becomes close to the Nelson–Åalen estimator of the cumulative hazard), still actuaries, who took advanced graduate classes, may recall that while there exists the Greenwood estimator of the variance, deducing a closed form for the variance is complicated, and even proving consistency requires using the theory of counting processes, martingale arguments or other advanced statistical tools, see a discussion in Roth (1985), Flemming and Harrington (1991) and Gill (2006).

The main aim of the paper is to explain that for left truncated and/or right censored data it is natural to begin statistical analysis with nonparametric estimation of the hazard rate which can be done using a sample mean approach. The attractive feature of this approach is that it plainly explains what can and cannot be estimated for LTRC data. In particular, it will be shown how LT and RC affect estimation of the left and right tails of the distribution. The paper also explains how to use graphics for statistical analysis of LTRC data.

The rest of the paper is as follows. Section 2 explains LTRC model, introduces main notations, and develops probability formulas. It also sheds light on why estimation of the hazard rate is natural for LTRC data. Section 3 is devoted to estimation of the hazard rate. Section 4 considers estimation of the probability density, and it explains why in general only characteristics of a conditional distribution may be estimated. Examples and a numerical study, illustrating the proposed analysis of LTRC data, are presented in Section 5. Then, after the Conclusion, the reader may find the list of main notations used in the paper.

# 2. LTRC Model and Probability Formulas

We begin with the probability model for the mechanism of generating a sample of size n of left truncated and right censored (LTRC) observations. The above-presented “Home-Insurance” and “Surgery” examples may be useful in understanding the mechanism, and in what follows we use notations of those examples.

The LTRC mechanism of data modification is defined as follows. There is a hidden sequential sampling from a triplet of nonnegative random variables (T∗,X∗,C∗) whose joint distribution is unknown. T∗ is the truncation random variable, X∗ is the random variable of interest, and C∗ is the censoring random variable. Right censoring prevents us from observing X∗, and instead we observe a pair (V,Δ) where V:=min(X∗,C∗) and Δ:=I(X∗≤C∗) is the indicator of censoring. Left truncation allows us to observe (V,Δ) only if T∗≤V. To be more specific, let us describe the LTRC model of generating a sample (T1,V1,Δ1),…,(Tn,Vn,Δn). Suppose that (T∗k,X∗k,C∗k) is the kth realization of the hidden triplet and that at this moment there already exists a sample of size l<n of LTRC observations. If T∗k>min(X∗k,C∗k) then the kth realization is left truncated meaning that: (i) The triplet (T∗k,X∗k,C∗k) is not observed; (ii) The fact that the kth realization occurred is unknown; (iii) Next realization of the hidden triplet occurs. On the other hand, if T∗k≤min(X∗k,C∗k) then the LTRC observation (Tl+1, Vl+1, Δl+1) :=(T∗k, min(X∗k,C∗k), I(X∗k≤C∗k)) is added to the LTRC sample whose size becomes equal to l+1. The hidden sampling from the triplet (T∗,X∗,C∗) stops as soon as l+1=n.

Because in what follows we are considering only left truncation and right censoring, we may skip terms left and right for truncation and censoring, respectively.

Now let us make an interesting probabilistic remark about the sequential sampling. The random number K of hidden simulations, required to get a fixed number n of LTRC observations, has a negative binomial (also referred to as binomial waiting-time or Pascal) distribution which is completely defined by the integer parameter n and the probability P(T∗≤min(X∗,C∗)) of success. On the other hand, if the total number k of hidden realizations is known (for instance, in the “Surgery” example this is the total number of surgeries), then the random number of participants in the study has a binomial distribution which is completely characterized by the above-mentioned probability of success and k trials. In our setting we are dealing with the former case and fixed n, and the remark sheds additional light on the LTRC model.

In what follows it is assumed that the continuous and nonnegative random variable of interest X∗ is independent of (T∗,C∗) while T∗ and C∗ may be dependent and have a mixed (continuous and discrete) joint distribution.

Now we are ready to present useful probability formulas for the observed variables. Write,

P(V≤v,Δ=1) =P(X∗≤v,X∗≤C∗|T∗≤min(X∗,C∗)) =P(X∗≤v,X∗≤C∗,T∗≤min(X∗,C∗))P(T∗≤min(X∗,C∗)) =p−1P(X∗≤v,X∗≤C∗,T∗≤X∗) =p−1∫v0fX∗(x)P(T∗≤x≤C∗)dx.

Here in the first equality the definition of truncation is used, the second equality is based on definition of the conditional probability, the third one uses notation

p:=P(T∗≤min(X∗,C∗))

for the probability to avoid the truncation and the fact that event X∗≤C∗ implies min(X∗,C∗)=X∗, and the last equality uses the independence of X∗ and (T∗,C∗).

Differentiation of (2.1) with respect to v yields the following formula for the mixed density,

fV,Δ(v,1) =p−1fX∗(v)P(T∗≤v≤C∗) =hX∗(v)[p−1SX∗(v)P(T∗≤v≤C∗)] =hX∗(v)P(T≤x≤V).

In (2.3) the second equality uses definition of the hazard rate, and let us explain the last equality. Write,

P(T≤x≤V) =P(T∗≤x≤min(X∗,C∗) |T∗≤min(X∗,C∗)) =p