# 1. Introduction

In property and casualty (P&C) reserving practice, the estimation of reserves is a learning task: how to use the history of paid claims to predict the pattern of emergence of future claims. A major ongoing challenge is to quantify the respective uncertainty, or distribution, of future claims emergence. This is important for determining the required reserves, risk capital, and the resultant solvency of an insurance company.

In this paper, we explore the use of Gaussian process (GP) models for loss development. GP regression is a powerful machine learning technique for empirical model building that has become a centerpiece of the modern data science tool kit. GP approaches now form an extensive ecosystem and are fast being popularized across a wide range of predictive analytics applications. We posit that GPs are also highly relevant in the context of risk analytics by offering both full uncertainty quantification and a hierarchical, data-driven representation of nonlinear models. The proposed GP framework for loss triangles revolves around viewing the emergence of claims by accident and development year as a random field, bringing to the forefront a spatial statistics perspective. More precisely, we provide a fully Bayesian GP approach for the modeling of claims development that yields a full probabilistic distribution regarding future losses. We then show how GPs can be adapted to deal with the considerable complexities of loss development distributions including nonstationarity, heteroskedasticity, and domain constraints.

## 1.1. Models for Loss Development

The loss development challenge is to project from the “upper triangle” of historical claims the distribution of the lower triangle of future claims. The mean of that distribution is typically used to determine the level of loss reserves, and its tail is used to set the desired level of allocated risk capital. The fundamental objects available in loss development are cumulative paid claims (CC), indexed in terms of accident year p (`AY`

) and development lag q (`DL`

). In the National Association of Insurance Commissioners (NAIC) data set used below (see Section 1.5), we have CCp,q for p=1988,…,1997, q=1,…,10=Q, where Q denotes the full development horizon (10 years in NAIC data). Two derived quantities are the incremental payments

Ip,q:={CCp,1,q=1;CCp,q−CCp,q−1,q=2,…,Q,

and the year-over-year loss development factors (LDFs), i.e., the consecutive ratios between cumulative paid claims,

Fp,q:=CCp,qCCp,q−1,q=2,….

The advantage of LDFs is that they are normalized in terms of overall claims volume. A metric related to incremental claims is the incremental loss ratio (ILR),

Lp,q:=Ip,q/Pp,

where Pp is the premium for the pth accident year. We refer to LDF and ILR as “modeling factors” in this paper and denote them by Y.

The training, or historical, data are presented via a triangle of development claims. We view the triangle as a collection of cells, indexed by two “inputs,” namely (i) AYi, and (ii) DLi, and one “output,” for example, Li. We are interested in linking inputs xi=[AYi,DLi] to the outputs yi=Li (or yi=Fi), converting the loss development triangle/square into a columnar set of data with a single index i. The paradigm we adopt is a *response surface approach*, i.e., construction of a statistical latent surface representing the “true” output, which is then noisily observed in actual data. Thus, we postulate that there is an underlying function x↦f(x) such that

yi=f(xi)+ϵi.

Following the triangular structure, we henceforth index cells by their year and lag p,q, writing AYp,DLq. For concreteness, we focus on the ILRs Lp,q≡Yp,q as the outputs; then {fp,q≡f(x)} is a true incremental loss development structure that describes the completed square, and ϵi are the observation noises, which are independent, with state-dependent variance Var(ϵp,q)=σ2p,q. For example, the chain ladder (CL) method assumes that the LDF Fp,q is log-normal with a mean μq (i.e., independent of accident year), prescribed via the CL formula, and variance dependent on CCp,q. More generally, we construct a statistical model relating the observed factors Y’s to the true latent input-output relationship, plus observation noise yi∼L(f(xi)), where L captures the likelihood of realized observations.

In this paradigm, forecasting future losses translates into the double step of *inference* on the latent {fp,q} and then constructing the *predictive distribution* of future modeling factors Yp,q's that is based on inferring the structure of {ϵp,q}. For instance in the CL, the inferred logarithmic loss development factors μq are added together to inflate to the ultimate log development factor, combined with the (additive on the log scale) noises ϵp,q and finally exponentiated to obtain CCp,Q.

## 1.2. Uncertainty Quantification

Our primary goal is to extrapolate existing losses to complete the full loss square given experience up to the present date YR. This extrapolation must recognize that loss development is subject to stochastic shocks, and hence the forecasted Yp,q,p+q>YR should be represented as random variables. Indeed, beyond providing the *expected* modeling factor E[Yp,q] (or the more commonly expected cumulative ultimate losses E[CCp,Q]), we wish to quantify the respective distribution, so that we may, for example, assess the quantile of CCp,Q as needed in solvency computation. Since we have the fundamental relationship that

CCp,Q={CCp,qQ−1∏ℓ=qCCp,ℓ+1CCp,ℓ=CCp,qQ−1∏ℓ=qFp,ℓorCCp,q+Pp(Q∑ℓ=q+1Lp,ℓ)

to forecast unpaid claims, we need to forecast future Lp,ℓ's or Fp,ℓ's.

The overall task discussed above of constructing a predictive distribution of CCp,q conditional on losses so far introduces the fundamental distinction between extrinsic and intrinsic uncertainty. The extrinsic uncertainty, also known as process variance, is linked to the fact that in equation (1.4) observed losses are noisy, so one must first forecast the latent surface fp,q and then overlay the extrinsic noise ϵp,q. Thus, extrinsic uncertainty is the variance Var(ϵp,q)=σ2p,q. In contrast, intrinsic, or model or parameter, uncertainty refers to the impossibility of fully learning the true data-generating process fp,q given a finite amount of loss data. Traditionally, extrinsic uncertainty is fitted through a parametric assumption (e.g., log-normal LDFs or Poisson incremental claims), while intrinsic uncertainty is obtained from the standard error of a point estimate fp,q, inferred through maximum likelihood estimation (MLE) or via ad hoc formulas such as the CL.

Many existing models tend to underestimate intrinsic uncertainty, which Track changes is on 40 leads to underpredicting the variance of ultimate claims. As a result, the models do not statistically validate in large-scale out-of-sample tests for reserving risk, due to being light-tailed. Thus actual ultimate claims are frequently too far above or too far below the mean forecast, relative to the estimated predictive variance. One popular remedy has been Bayesian frameworks (England, Verrall, and Wüthrich 2012; Kuo 2019; Taylor 2015; Wüthrich 2018; Zhang and Dukic 2013; Zhang, Dukic, and Guszcza 2012) that more explicitly control the residual randomness in {fp,q} through the respective priors. In this article, we contribute to this strand by developing a machine-learning-inspired, hierarchical approach that includes *three layers* of uncertainty.

First, we adopt a semiparametric extrinsic uncertainty, prescribing the distributional family of ϵp,q, but allowing for cell-dependence. As our main example, we take ϵp,q to be normally distributed, but with variance σ2q depending on development year.

Second, we model fp,q in terms of a Gaussian random field. This offers a natural quantification of model risk through the two-step process of conditioning fp,q on the observed loss triangle and then computing the residual posterior variance in fp,q. The latter posterior variance summarizes our confidence in correctly inferring fp,q given the data and is referred to as intrinsic uncertainty.

Third, we account for the misspecification risk regarding the spatial structure of f by a further Bayesian Markov chain Monte Carlo (MCMC) procedure. Specifically, we employ MCMC to learn the distribution of the *hyperparameters*, especially the lengthscales, of the Gaussian process representing f. This is the correlation uncertainty layer. We show that in aggregate these three uncertainty layers not only properly account for overall predictive uncertainty (which is no longer underestimated), but also generate nonparametric predictive distributions while maintaining computational tractability afforded by the Gaussian structure.

To summarize, in our GP-based framework, completing the triangle comprises (i) sampling, via MCMC, hyperparameters ϑ of f consistent with observations D; (ii) sampling a realization of fp,q's conditional on ϑ and D; (iii) sampling a realization of future observation noise ϵp,q; and (iv) combining fp,q's and ϵp,q's to return the (sample-based) distribution of CCp,Q via equation (1.5). Importantly, while observation noises ϵ are a priori independent across cells, the spatial dependence in fp,q creates correlation in modeling factors across both accident years and development lags. This is a major distinction from traditional methods, where no such correlation is available. Moreover, the GP paradigm allows one to produce directly the whole vector {fp,q+1,fp,q+2,…,fp,Q}, which gives a full projection of the claims between today and the ultimate development date.

## 1.3. Related Literature

Stochastic loss reserving addresses uncertainty quantification of unpaid non-life insurance losses. The vast literature is centered around the CL method (Mack 1993, 1994), which provides analytic predictive distributions based on empirical plug-in estimates, and Bayesian methods (England and Verrall 2002), which extract posterior distributions through bootstrapping or MCMC approaches. We refer to the Wüthrich and Merz (2008) and Meyers (2015) monographs for an overview of the state of the art.

The CL approach provides empirical formulas for the latent LDFs {fp,q} that are then commonly injected into a log-normal model for Fp,q. Of note, the CL mean μq can be interpreted as a weighted average of F⋅,q across accident years, fq=∑ℓwℓ,qFℓ,q. To provide statistical underpinnings to the CL formulas, analysts have developed Bayesian CL frameworks—see Merz and Wüthrich (2010). The latter derive μq as the posterior mean of a certain Bayesian conditioning expression. In turn, this allows the natural extension to consider intrinsic uncertainty, i.e., to use the full posterior of μq to capture model risk. Another notable extension of CL considers correlation in ϵp,q across accident years to remedy the respective independence assumption in the original CL, which does not validate well. Such correlation is also a popular way to boost the variance of ultimate claims. A related issue is the inclusion of a payment year trend.

In the multivariate Gaussian reserving models (Wüthrich and Merz 2008, Ch. 4), the cumulative claims CCp,q are modeled as a log-normal random variable with mean/standard deviations μp,q,σp,q. The typical assumption is that the logged mean is of the form μp,q=αp+βq, decomposing the true surface as an additive combination of one-dimensional accident year α and development lag β trend factors (an idea similar to the age–period decomposition of mortality surfaces). The variances σ2p,q are then directly specified via a correlation matrix Σ. In the most common version, they are taken to be independent across cells; see, for instance, Meyers (2015), who proposed the leveled CL (LCL) approach whereby the observation variance σ2q is constant across accident years and decreases in lag q. The three latent vectors α,β,σ are estimated via MCMC (implemented in Stan) after specifying certain structured priors on them. In the alternative correlated chain ladder proposed by Meyers (2015), the mean structure is augmented with μp,q=αp+βq+ρ(logCCp−1,q−μp−1,q), generating across-year correlation between cumulative paid claims. However, in his results the posterior of the estimated correlation coefficient ρ is quite wide, being both positive and negative, precluding drawing a clear conclusion regarding the across-year dependence. Zhang and Dukic (2013) and Zhang, Dukic, and Guszcza (2012) employ another parameterization, whereby they forecast cumulative losses with a prescribed nonlinear functional form, fitted through a hierarchical Bayesian model. Statistically similar is the approach of England and Verrall (2002), who used generalized additive models based on cubic splines; see also the extension to a GAMLSS model in Spedicato, Clemente, and Schewe (2014).

An alternative is to model CCp,q directly. This paradigm dispenses with the existence of a “true” data-generating loss process and views triangle completion directly as an extrapolation task: given {CCp,q,p+q≤YR}, we seek to build an extrapolating surface that matches the observed upper triangle and smoothly extends it into the bottom half. In this framework, there is no observation noise, but rather the surface is viewed as a random field that intrinsically “vibrates.” Projected losses are then obtained by conditioning on historical observations but otherwise taking the remaining intrinsic variability in Fp,q as the measure of the randomness of future LDFs. This approach will lead to correlated fluctuations in Fp,q due to the spatial covariance.

This idea was recently implemented by Lally and Hartman (2018) (henceforth LH), who proposed applying GP regression to run-off triangles. In their framework the conditional residual variance of fp,q is precisely the predictive uncertainty. This idea exploits the fact that Var(fp,q)→η2 for far-out extrapolation. LH construct GP regression models for CCp,q viewing cumulative losses as a smooth function of accident period and development lag. The inherent spatial nonstationarity of the run-off triangle is handled through input warping that is learned as part of fitting the GP model. LH work in a Bayesian framework, capturing hyperparameter uncertainty, as well as uncertainty about the best warping function.

## 1.4. Contributions

Our proposal to employ GP models for loss development combines the existing concept of a Bayesian chain ladder with a spatial representation of loss emergence. Thus, we model the development triangle as a random field; the inferred correlations across development lags and accident years allow for consistent stochastic extrapolation of future claims in both dimensions. In particular, this allows a statistical investigation of the trend/correlation in both the `AY`

and `DL`

dimensions.

The GP framework brings multiple advantages relative to existing loss development approaches. Having a coherent statistical framework for capturing all three uncertainty layers—extrinsic observation noise, intrinsic model uncertainty, and intrinsic correlation uncertainty—is the first distinguishing feature of our method. This combination offers a rigorous way to describe the predictive distribution of ultimate claims CCp,Q. In particular, we can generate nonparametric, empirical distributions of any subset of CCp,q's. To that end, the basic GP structure implies that a forecasted ILR Yp,ℓ is normally distributed. When modeling ILRs, the additive form in equation (1.5) implies that ultimate cumulative claims CCp,Q is also normally distributed. However, incorporating hyperparameter uncertainty leads to a mixture-of-Gaussians distribution of Lp,ℓ and hence a much more flexible description for the tails of CCp,Q.

Second, the spatial structure of GPs allows us to generate consistent multiperiod trajectories of future losses, offering a full uncertainty quantification not just of a specific CCp,q but of the entire vector CCp,⋅. Such joint draws of the full ILR path are necessary to capture the covariance structure not only of future capital requirements and risk allocation but also of the timing of the emergence of claim payments. Many existing models are either unable to generate full scenarios for the time-path (CCp,⋅) or would generate wildly infeasible ones. For instance, since the Mack (1993) model only assumes that each LDF is log-normal and is otherwise independent and identically distributed, consecutive draws of CCp,q as q changes will almost certainly lead to nonsensical, non-monotone cumulative paid claims.

Third, we discuss coherent ways to adjust the GP model to respect the constraints of the development data, in particular the declining variance as losses develop, asymptotic convergence to ultimate losses, and paid incremental claims that are almost always positive. These constraints are especially important for far-into-the-future forecasts. To implement them, we built a custom *hurdle* model, augmented with virtual observations. As far as we know, this is a new methodology for GP surrogates of truncated data and is of independent interest to the GP community. Let us remark that in extrapolation tasks, which are necessarily mostly assumptions-driven, professional judgment can be conveniently encapsulated into the GP model via both the mean m and the spatial kernel structure. Moreover, the corresponding forecast accuracy is then neatly encoded in the GP predictive variance, transparently showing to what extent the forecast is data-driven or assumption-based.

Fourth, the spatial structure extends naturally to analysis of multiple triangles, and hence brings a consistent way to handle multivariate losses. In Section 4.5 we show how GPs offer a ready-made mechanism to borrow strength (i.e., actuarial credibility) from the experience of related firms.

From the empirical side, we systematically check the efficacy of point estimates and distributions across a wide range of business lines and companies based on the paid claims in the full NAIC database. In particular, we present a variety of assessment metrics that go beyond the standard root-mean-square-error measures to assess the accuracy of the predictive distribution and predictive intervals (CRPS, NLPD, and K-S metrics, see Section 4.1). To our knowledge, these are new tools for uncertainty quantification of loss development and yield a detailed aggregate analysis.

Relative to Lally and Hartman (2018), who also investigated GPs for loss emergence, we note several key differences in the setups. First, LH work with cumulative losses, only mentioning incremental losses in passing. In our opinion, this is much less clear conceptually since it is difficult to disentangle the intrinsic dependence from cumulating claims from a true spatial correlation across `DL`

and `AY`

. Second, in LH the model for CCp,q has no provision for ensuring monotonicity or convergence to full development. Instead, LH address this nonstationarity through input warping that reduces Var(CCp,q) as q grows. Lack of any monotonicity constraints implies that when computing the predictive distribution of ultimate losses, implausible values of CCp,q are averaged. Third, LH eschew the issue of noisy losses—their model includes σ2 (taken to be constant), but it essentially serves as a numerical regularization device, bundling the randomness of claim emergence with the *average* pattern of loss development. In contrast, our approach distinguishes between the “true” loss development pattern and the observations. Fourth, LH present only a very limited study of three triangles; our analysis of the whole NAIC data set offers much more comprehensive conclusions.

Finally, we mention that other Bayesian spatial models have been proposed to represent the loss development surface; see, for instance, Gangopadhyay and Gau (2003), who proposed using bivariate smoothing splines combined with Gibbs sampling. However, a spline model is less suited for extrapolation (as it does not allow specifying a prior mean function), which is the main task in completing the triangle, and is furthermore more awkward for uncertainty quantification, necessitating the use of expensive MCMC compared with the parsimony of equation (2.3).

Our `R`

and `Stan`

code is available for inspection via a public repository at https://github.com/howardnewyork/gp_ilr. That folder also contains the data set described below and a brief guide through a README file.

## 1.5. Data Set

For our data set we use the NAIC database. That database was originally assembled by Meyers and Shi (2011) who built it using Schedule P triangles from insurer NAIC annual statements reported in 1997. These annual statements contain insurer-level run-off triangles of aggregated losses by line of insurance. The data in Schedule P include net paid losses, net incurred losses (net of reinsurance), and net premiums. The triangles were completed using subsequent annual statements from 1998–2006; a complete description of how it was constructed and how the insurers were selected is available on the CAS website (Meyers and Shi 2021).

The rectangles encompass 200 companies across six different business lines (not all companies have all lines): commercial auto `comauto`

(84 companies), medical malpractice `medmal`

(12), private passenger auto `ppauto`

(96), product liability `prodliab`

(87), other liability `othliab`

(13), and workers compensation `wkcomp`

(57). Each rectangle consists of 10 accident years and 10 development years, covering accident years 1988–1997 and development lags 1 through Q=10, covering the period 1988 to 2007 (for the 10th development year for claims originating in 1997). To construct our training set we use the upper triangle, i.e., the data as it would appear in 1997, for a total of 45 LDF observations per triangle and 55 ILR observations. From the provided data, we use only those companies that have a full set of 10 years of data without any missing fields. We also exclude companies that have negative premiums in any year. Throughout, we focus on *paid* claims.

Figure 1 shows the typical shape of incremental loss ratios across different business lines (see also Table 2 in Appendix C that shows a typical triangle in terms of raw CCp,q and the corresponding ILRs Lp,q). The boxplots show the distribution of L⋅,q across the companies of a given line. We observe that, as expected, ILRs tend to decrease in lag (with the notable exception of `medmal`

where ILRp,⋅ tends to peak at q=3 and `prodliab`

where ILRs are essentially flat for the first 3 years). The auto insurance triangles tend to develop by q=8 or so; however, all other lines exhibit a significant proportion of positive ILRs even at q=10. We also observe that there is a nontrivial, frequently non-monotone pattern to the dispersion of L⋅,q, which implies the need for a careful statistical modeling of σp,q.