A Linear Approximation to Copula Regression

Paul G. Ferrara; Rahul A. Parsa; Bryce A. Weaver

1. Introduction

Over the past several decades, generalized linear models (GLMs) have become ubiquitous in actuarial science. Within the insurance industry, GLMs were first utilized in the property and casualty industry, due to its inherent need for custom modeling. However, GLMs and predictive modeling have recently found application in the life insurance industry as well. GLMs have enjoyed this popularity largely due to their ability to overcome the limitations of ordinary least squares regression (OLS) when applied to data which exhibit non-linear relationships, and non-normal distributions. In their traditional form, GLMs require that the dependent variable have a distribution from the exponential family of distributions. This requirement primarily ensures the existence of a desirable relationship between the variance and mean of dependent variable. In most situations this restriction is at most a minor inconvenience; however, there are circumstances in which data can be best described by distributions outside of the exponential family of distributions. In such cases, it can be beneficial to have access to an even more general class of models. In response to this need, Parsa and Klugman (2011) proposed a regression modeling framework which exploits the flexibility of copula techniques.

Not long after the introduction of copula regression, however, it was noticed that the formulation of copula regression as a conditional expected value could be approximated by an OLS regression on appropriately transformed data. Such an approximation would have the advantage of ease of implementation, using almost any statistical software or even Microsoft Excel. Moreover, upon investigating this linear approximation to copula regression, Parsa and Klugman (2011) noticed that the approximation often produced estimates which were quite close to, yet consistently underestimated those from exact copula regression. Further, there seemed to be a systematic deviation between the estimates from each method. These observations motivated the current research. This paper will proceed as follows: In section 2 we briefly describe copulas, and copula regression. In section 3 we describe the linear approximation to copula regression, which is the main subject of this research. In section 4 we investigate sufficient conditions which ensure a predictable pattern of bias in the estimates from the linear approximation to copula regression. In section 5 we introduce transmutation mappings, which play an integral role within the linear approximation to copula regression, and also within the theory of copulas in general. In section 6, we investigate the most commonly used loss distributions to see if they satisfy the sufficient conditions from section 4, and hence lead to predictably biased estimates (with respect to full copula regression) from the linear approximation to copula regression.

2. Copula regression

Copula methods are essentially just a particular way to construct multivariate distributions. Copula functions themselves are simply multivariate distributions whose domain has uniform marginal distributions. As described in Parsa and Klugman (2011), a copula model can be constructed for an n-dimensional set of data by first fitting n marginal distributions to the data \(F_1\left(X_1\right), F_2\left(X_2\right), \ldots, F_n\left(X_n\right)\). Then the joint-distribution, \(F_{x_1, \ldots, x_n}\left(X_1, X_2, \ldots, X_n\right)\), can be created by applying a copula function to \(F_1\left(X_1\right), F_2\left(X_2\right), \ldots, F_n\left(X_n\right)\), specifically \(F_{x_1, \ldots, x_n}\left(X_1\right.\), \(\left.X_2, \ldots, X_n\right)=C\left[F_1\left(X_1\right), F_2\left(X_2\right), \ldots, F_n\left(X_n\right)\right]\). The copula function \(C\) induces a correlation structure between the variables \(X_1, X_2, \ldots, X_n\). By construction, this correlation structure will be independent of the marginal distributions of the data. In order to ensure that the resulting multivariate function is in fact a distribution function, only certain functions can be used as copula functions. However, Sklar’s theorem guarantees that, for any set of random variables \((R V s) X_1, X_2, \ldots, X_n\) with associated marginal distributions \(F_1\left(X_1\right), F_2\left(X_2\right), \ldots, F_n\left(X_n\right)\), a copula function, \(C\), exists s.t.; \(F_{x_1, \ldots, x_n}\left(X_1, X_2, \ldots, X_n\right)=C\left[F_1\left(X_1\right)\right.\), \(\left.F_2\left(X_2\right), \ldots, F_n\left(X_n\right)\right]\). Moreover, if the marginal distributions \(F_i\left(X_i\right)\) for \(i \in\{1,2, \ldots, n\}\) are continuous, then there exists a unique copula function, defined on \(F_1\left(X_1\right), F_2\left(X_2\right), \ldots, F_n\left(X_n\right)\), such that \(F_{x_1, \ldots, x_n}\left(X_1\right.\), \(\left.X_2, \ldots, X_n\right)=C\left[F_1\left(X_1\right), F_2\left(X_2\right), \ldots, F_n\left(X_n\right)\right]\). Further, in the case of continuous marginal distributions, the copula function allows the multivariate dependence structure to be separated from the marginal distributions of the data. At this point, it is important to point out that, although the mathematical theory of copulas is very rich, and many powerful results have been proven, not every multivariate distribution can be naturally, or “usefully”, modeled using copulas. To quote Paul Embrechts: “copulas form a most useful concept for a lot of applied modeling, they do not yield, however, a panacea for the construction of useful and well-understood multivariate density functions” Embrechts (2009). While there are many copula models, most are only bivariate, and hence do not allow for fully independent correlation between the response variable and each of the covariates. For this reason, Parsa and Klugman assert that the multivariate normal and \(t\)-copula are the most useful for the purposes of copula regression. In basic OLS regression, the distribution of \(Y\) given the covariates is assumed to be normal, and the predicted values are specified by \(E\left[Y \mid X_1, \ldots, X_{n-1}\right]\). The predicted values, under copula regression, are defined analogously.

Definition

Given N observations of the RVs \(X_1, \ldots, X_{n-1}, Y\), copula regression, as defined by Parsa and Klugman (2011), is the process of estimating the observed values \(\left\{y_i \mid i \in 1,2, \ldots, N\right\}\) of the RV \(Y\), based on the corresponding values of the RVs \(X_1, \ldots, X_{n-1}\), namely \(\left\{x_{1, i}, \ldots, x_{n-1, i} \mid i \in 1,2, \ldots, N\right\}\).

\[ \hat{y}_{i}=E\left[Y \mid X_{1}=x_{1 ;}, \ldots, X_{n-1}=x_{n-1 ;}\right], \tag{1} \]

where the conditional expected value is computed with respect to the conditional density of the RVs \(X_1, \ldots, X_{n-1}, Y\) induced from a multivariate copula function \(C\), i.e.,

\[ \begin{array}{l} F_{x_{1}, \ldots, x_{n}}\left(X_{1}, X_{2}, \ldots, X_{n-1}, Y\right) \\ \quad=C\left[F_{1}\left(X_{1}\right), F_{2}\left(X_{2}\right), \ldots, F_{n}\left(X_{n-1}\right), F_{Y}(Y)\right] \end{array} \tag{2} \]

In this paper, we concentrate on the case where the copula function C is the multivariate normal copula, and reserve investigation of the t-copula for subsequent research. We now review the basics of copula regression, under the multivariate normal copula. For a more complete treatment, the interested reader is referred to Parsa and Klugman (2011). The distribution function induced by the multivariate normal copula has the form:

\[ \begin{array}{l} F_{x_{1}, \ldots, x_{n-1},}\left(X_{1}, \ldots, X_{n-1}, Y\right) \\ =G\left(\Phi^{-1}\left[F_{1}\left(X_{1}\right)\right], \ldots, \Phi^{-1}\left[F_{n-1}\left(X_{n-1}\right)\right], \Phi^{-1}\left[F_{y}(Y)\right]\right) \end{array} \tag{3} \]

where \(F_1\left(X_1\right), \ldots, F_{n-1}\left(X_{n-1}\right), F_y(Y)\) are the marginal distributions of the RV’s \(X_1, \ldots, X_{n-1}, Y\), and \(G(\cdot)\) is the multivariate normal distribution. In what follows we assume that all marginal distributions are continuous. Under this assumption, the density function induced by the multivariate normal copula can be written as follows:

\[ \begin{array}{r} f\left(X_{1}, X_{2}, \ldots, X_{n-1}, Y\right)=f_{1}\left(X_{1}\right) \ldots f_{n}\left(X_{n-1}\right) f_{y}(Y) \\ \cdot \exp \left\{-\frac{\vec{v}^{T}\left(R^{-1}-\mathrm{I}\right) \vec{v}}{2}\right\} \times|R|^{-\frac{1}{2}} \end{array} \tag{4} \]

where:

\(\begin{aligned} & \vec{v}=\left(v_1, \ldots, v_{n-1}, v_n\right), \text { and } v_i=\Phi^{-1}\left[F_i\left(X_i\right)\right] \text { for } \\ & i=\{1,2, \ldots, n-1\}, \text { and } v_n=\Phi^{-1}\left[F_y(Y)\right]\end{aligned}\)
I is the Identity matrix.
\(R=\left|\begin{array}{cc} \boldsymbol{R}_{n-1} & \vec{r} \\ \overrightarrow{\boldsymbol{r}}^{T} & 1 \end{array}\right|\) is the copula correlation matrix.

where

\(R_{n-1}=\left|\begin{array}{llll} 1 & r_{x_{1}, x_{2}} & \cdots & r_{x_{1}, x_{n-1}} \\ r_{x_{2}, x_{1}} & 1 & \cdots & r_{x_{2}, x_{n-1}} \\ \vdots & \cdots & \cdots & \vdots \\ \vdots & & \ddots & r_{x_{n-2}, x_{n-1}} \\ r_{x_{n-1}, x_{1}} & \cdots & r_{x_{n-1, n-2}} & 1 \end{array}\right|\), and
\(\vec{r}=\left(r_{x_1, y}, r_{x_2 y}, \ldots, r_{x_{n-1}, y}\right)^T \text {, and: }\)
\(r_{x_i x_j}\) is the correlation between\(v_i\) and \(v_j\), and \(r_{x_i, y}\) is the correlation between \(v_i\) and \(v_n\), for \(i, j<n\)

Correspondingly, it is shown in Clement and Reilly (1999) that the conditional density of \(Y\), given \(X_1\), \(X_2, \ldots, X_{n-1}\) has the following form:

\[ \begin{array}{l} f\left(Y \mid X_{1}, X_{2}, \ldots, X_{n-1}\right)=\frac{f_{y}(Y)}{\sqrt{1-\vec{r}^{T} \cdot R_{n-1}^{-1} \cdot \vec{r}}} \times \\ \exp \left\{-\frac{1}{2}\left[\frac{\left(\Phi^{-1}\left[F_{y}(Y)\right]-\vec{r}^{T} \cdot R_{n-1}^{-1} \cdot v^{*}\right)^{2}}{1-\vec{r}^{T} \cdot R_{n-1}^{-1} \cdot \vec{r}}\right]\right\} \times \\ \quad \exp \left\{\frac{1}{2}\left(\Phi^{-1}\left[F_{y}(Y)\right]\right)^{2}\right\} \end{array} \tag{5} \]

where \(v^*=\left\{v_1, v_2, \ldots, v_{n-1}\right\}\) with \(v_i=\Phi^{-1}\left[F_i\left(X_i\right)\right]\), and \(\vec{r}, R_{n-1}\), and \(R\) are defined as above. Under the multivariate normal copula, \(f\left(Y \mid X_1, X_2, \ldots, X_{n-1}\right)\) specifies the predicted values \(E\left[Y \mid X_1, \ldots, X_{n-1}\right]\), up to specification of the marginal distributions \(F_1\left(X_1\right)\), \(F_2\left(X_2\right), \ldots, F_{n-1}\left(X_{n-1}\right), F_Y(Y)\).

Ideally, to parameterize the copula regression model based on empirical data, maximum likelihood estimation is performed using the density function \(f\left(X_1, X_2, \ldots, X_{n-1}, Y\right)\), above. If \(n=8\), and each marginal distribution has two parameters, it is necessary to estimate \(8(2)+8(8-1) / 2=\) 44 parameters. Alternatively, one can use the marginal data to fit each of the marginal distributions \(F_1\left(X_1\right), \ldots, F_{n-1}\left(X_{n-1}\right), F_y(Y)\), and then a second optimization can be performed to estimate the \(n(n-2) / 2\) correlations, within the MVN copula. As pointed out in Parsa and Klugman (2011), this alternative will produce suboptimal results. The relative complexity of fitting the \(n(m)+n(n-1) / 2\) parameters of a full multivariate normal copula regression model with \(n\) marginals, each with \(m\) parameters, is one of the motivations for the linear approximation to copula regression, which is the topic this paper. For more details on the parametrization of copula regression models, the reader is referred to Parsa and Klugman (2011).

3. A linear approximation to copula regression

Now that copula regression has been described, we introduce what the authors have dubbed the linear approximation to copula regression. The possibility of such an approximation to copulal regression arose organically in response to a question posed by a practicing property and casualty (P&C) actuary at the 2011 CAS Spring meeting. During the question and answer section of their presentation on copula regression, Parsa and Klugman were asked the following question: "For a given set of \(R V_s X_1, \ldots, X_{n-1}, Y\), since the distribution induced by the multivariate normal copula is essentially the multivariate normal distribution applied to the transformed RVs \(\Phi^{-1}\left[F_1\left(X_1\right)\right]\), \(\Phi^{-1}\left[F_2\left(X_2\right)\right], \ldots,\) \(\Phi^{-1}\left[F_{n-1}\left(X_{n-1}\right)\right],\) \(\Phi^{-1}\left[F_Y(Y)\right]\), and since a fully probabilistic version of multivariate linear regression assumes that the RVs follow a multivariate normal distribution, how does copula regression, under a multivariate normal copula, differ from simply applying OLS regression to the transformed RVs:

\[ \begin{array}{l} \Phi^{-1}\left[F_{1}\left(X_{1}\right)\right], \Phi^{-1}\left[F_{2}\left(X_{2}\right)\right], \ldots, \\ \Phi^{-1}\left[F_{n-1}\left(X_{n-1}\right)\right], \Phi^{-1}\left[F_{Y}(Y)\right] " \end{array} \]

In other words, if Y is considered the response variable, what is the difference in the estimates from the following models?

Perform OLS regression in the U, V^→ space, and then transform back to the Y, X^→ space:

Transform each of the \(n\) variables \(X_1, \ldots\) \(X_{n-1}, Y:\) \[ \begin{aligned} U=\Phi^{-1}\left[F_{y}(y)\right] \text { and } V_{i}= & \Phi^{-1}\left[F_{i}\left(x_{i}\right)\right] \\ & \text { for } i=\{1,2, \ldots, n-1\} \end{aligned} \]
Perform an ordinary OLS of \(U\) on the \(V_{i}\), to obtain \(Û\): \[ U=\beta_{0}+\beta_{1} \cdot V_{1}+\cdots+\beta_{n-1} \cdot V_{n-1}+\varepsilon where \varepsilon \propto N(0,1) \]
Then transform the \(\hat{U}\) back to the original scale: \(\hat{Y}=F_y^{-1}(\Phi(\hat{U}))\) to obtain the estimates \[ \hat{y}_{i}=F_{y}^{-1}\left(\Phi\left(\hat{u}_{i} \mid V_{1}=v_{1, i}, \ldots, V_{n-1}=v_{n-1, i}\right)\right) \]

Model 2

Perform copula regression in the Y, X^→ space:

\[\begin{align} \hat{y}_{i}&=E\left[Y \mid X_{1}=x_{1 i}, \ldots, X_{n-1}=x_{n-1 i}\right], \text{where} \\ &f\left(Y \mid X_{1}, X_{2}, \ldots, X_{n-1}\right) \text{is defined as in equation (5)}.\end{align} \]

Parsa and Klugman were immediately struck by the simplistic beauty of this reductive argument, and intrigued by the possibility that this approach could actually produce a close approximation to copula regression. Moreover, they soon realized that this question transcended copula regression, and actually addressed the larger issue of the effect of transformations of variables within regression modeling.

4. Sufficient conditions for bias

It seems likely that an understanding of the limitations of variable transformations, as a means to coerce data to fit the restrictive normal distribution assumption of linear regression, informed the development of GLMs. A simple example illustrates the difference between these two approaches: Suppose that it is known that the dependent variable Y follows a lognormal distribution. In introductory applied statistic classes, students are taught that in this case they should apply a log transformation to Y, and then perform OLS. The underlying model is that log(Y) = aX + b + where ∼ N(0, σ²). In other words, the assumption is that log(Y)|X has a normal distribution with mean aX + b, and constant variance. Then, one simply applies the inverse of the log transform to obtain the predicted values Ŷ = E(Y|X) = exp(aX + b). However, if Y in fact follows a lognormal distribution, then E(Y|X) = exp (aX + b + 1/2σ²). Hence, by applying the log transformation and assuming a normal distribution, the resulting estimate differs from the theoretically correct estimate by a factor of exp(1/2σ²). The disparity between these two estimates of E(Y|X) can be viewed as representing a bias in the estimates from the model involving transformations. Although it may escape consideration, such a disparity arises whenever transformations of variables are used within OLS. In light of this observation, it can be seen that the question at hand is fundamentally a question of the bias that is induced by applying the specific transformation Φ⁻¹[F_i (·)] within OLS regression. Further, due to the general nature of the transformations Φ⁻¹[F_i(·)], an investigation of the bias between copula regression and the linear approximation to copula regression may shed light on the bias induced by the use of transformations within OLS in general. As a first step towards investigating this bias, the linear approximation to copula regression was computed for the examples provided in Parsa and Klugman (2011). This resulted in several interesting observations. As anticipated by the motivating question, it was noticed that the estimates from the linear approximation to copula regression were in fact often very close to the estimates based on copula regression. Moreover, a consistent relationship between the two sets of estimates was observed. Specifically, the estimates from the linear approximation to copula regression were noticed to be consistently slightly lower than those from copula regression. This piqued the interest of the authors, making further investigation irresistible. The following results pave the way toward a more quantitative understanding of the relationship between the estimates from copula regression and the linear approximation to copula regression. Lemma 4.1 provides a connection between the copula regression of Y on X^→, and the transformed variables U, V^→. In what follows, the authors sometimes use the slightly more suggestive notation (F_y⁻¹ ∘ Φ)(x), in place of F_y⁻¹(Φ(x)).

Lemma 4.1

If \(F_y(Y)\), and \(F_i\left(X_i\right)\) for \(i \in\{1,2, \ldots, n-1\}\) are the continuous CDF’s, corresponding to the RV’s \(Y\), and \(\vec{X}\), respectively, where \(\vec{X}=\left\{X_1, X_2, \ldots, X_{n-1}\right\}\), then:

\[ E(Y \mid \vec{X})=E\left[\left(F_{y}^{-1} \circ \Phi\right)(U \mid \vec{V})\right] \tag{6} \]

where U = Φ⁻¹[F_y(Y)] and V_i = Φ⁻¹[F_i(X_i)] for i {1, 2, . . . , n − 1}

Proof. Let U and V^→ be defined as U = (Φ⁻¹ ∘ F_y)(Y) and V^→ = (Φ⁻¹ ∘ F_i)(X^→), where Φ⁻¹ ∘ F_i is applied component-wise to X^→. Let f(U|V^→) be the conditional multivariate normal distribution of U given V^→, then:

\[ \begin{aligned} E & {\left[\left(F_{y}^{-1} \circ \Phi\right)(U \mid \vec{V}=\vec{v})\right] } \\ & =\int_{u=-\infty}^{u=\infty}\left(F_{y}^{-1} \circ \Phi\right)(U) \cdot f(U \mid \vec{V}=\vec{v}) d U \\ & =\int_{u=-\infty}^{u=\infty} Y \cdot f(U \mid \vec{V}=\vec{v}) d U \\ & =\int_{y=h^{-1}(-\infty)}^{y=h^{-1}(\infty)} Y \cdot f(Y \mid \vec{V}=\vec{v})\left|\frac{d Y}{d U}\right| d U \\ & =\int_{y=0}^{y=\infty} Y \cdot f(Y \mid \vec{X}=\vec{x}) d Y=E(Y \mid \vec{X}) \end{aligned} \]

where h(y) = (Φ⁻¹ ∘ F_y)(y). The \(2^{nd}\) to last equality holds since V_i = v_i → Φ⁻¹[F_i(X_i)] = Φ⁻¹[F_i(x_i)] → X_i = x_i, and since h′(y) > 0 for all y by the inverse function theorem, and the last equality holds since h(0) = −∞, and h(∞) = ∞.

Next, Theorem 4.2 quantifies when the estimates from the linear approximation to copula regression will underestimate those from copula regression, by providing a sufficient condition for the under-estimation.

Theorem 4.2

If the CDFF_y(Y) is continuous, and the mapping g(U) = (F_y⁻¹ ∘ Φ)(U) is convex, then:

\[ E(Y \mid \vec{X}) \geq F_{y}^{-1}(\Phi(\hat{U})) \tag{7} \]

where E[Y|X^→] is the conditional expectation used to define copula regression, and F_y⁻¹(Φ(Û)) defines the linear approximation to copula regression.

Proof. Let \(F_y(Y)\), and \(F_i\left(X_i\right)\) for \(i \in\{1,2, \ldots\), \(n-1\}\) be the continuous CDF’s corresponding to the RV’s \(Y\), and \(\vec{X}\), respectively, where \(\vec{X}=\left\{X_1\right.\), \(\left.X_2, \ldots, X_{n-1}\right\}\). Let \(U\) and \(\vec{V}\) be defined as \(U=\left(\Phi^{-1}\right.\) 。 \(\left.F_y\right)(Y)\) and \(\vec{V}=\left(\Phi^{-1} \circ F_i\right)(\vec{X})\), where \(\Phi^{-1} \circ F_i\) is applied component-wise to \(\vec{X}\). If \(E[Y \mid \vec{X}]\) is the conditional expected value with respect to the density induced by the multivariate copula given in equation (5), then by Lemma 4.1 we have that: \(E(Y \mid \vec{X})=E\left[\left(F_y^{-1} \circ \Phi\right)\right.\) \((U \mid \vec{V})]\), and since \(g(U)=\left(F_y^{-1} \circ \Phi\right)(U)\) is convex, Jensen’s inequality gives that:

\[ E\left[\left(F_{y}^{-1} \circ \Phi\right)(U \mid \vec{V})\right] \geq\left(F_{y}^{-1} \circ \Phi\right)(E[U \mid \vec{V}]) \tag{8} \]

Further, since F_y⁻¹(Φ(Û)) is the linear approximation to copula regression, Û is obtained from the OLS regression of U on the V_i. So, by definition Û = E[U|V^→], we have that:

\[ F_{y}^{-1}(\Phi(\hat{U}))=\left(F_{y}^{-1} \circ \Phi\right)(E[U \mid \vec{V}]) \tag{9} \]

Hence, in conclusion, E[Y|X^→] ≥ F_y⁻¹(Φ(Û)).

So, the question is, for what CDFs F_y is the mapping (F_y⁻¹ ∘ Φ)(·) convex? Armed with Theorem 4.2, we now know that convexity of the mapping (F_y⁻¹ ∘ Φ)(·) is sufficient to ensure that the estimates from the linear approximation to copula regression will be bounded above, by the corresponding estimates from exact copula regression. We now present a set of criteria which ensure the convexity of the mappings (F_y⁻¹ ∘ Φ)(·), and hence help quantify the bias in the estimates from the linear approximation to copula regression.

Lemma 4.3

IfF_y(Y) is a continuous CDF, and y(x) = (F_y⁻¹ ∘ Φ)(x), then y(x) is convex, for all x IFF:

\[ \frac{\phi^{\prime}(x)}{\phi^{2}(x)} \geq \frac{f_{y}^{\prime}(y(x))}{f_{y}^{2}(y(x))} \tag{10} \]

for all x where f_y′(y(x)) denotes the derivative of f_y(y(x)) WRT y, evaluated y(x), and Φ(x) is the standard normal CDF, and correspondingly, φ(x) is the standard normal density.

Proof. Since, y(x) = (F_y⁻¹ ∘ Φ)(x), we have that F_y(y(x)) = Φ(x). Taking the derivative of both sides WRT x, we have that: f(y(x)) · \(\frac{d y(x)}{d x}\) = φ(x), which implies:

\[ \frac{d y(x)}{d x}=\frac{\phi(x)}{f(y(x))} \tag{11} \]

Once again, taking the derivative of both sides WRT x, re-arranging terms, and substituting in the equality from equation (11) for \(\frac{d y(x)}{d x}\), we have: \(\frac{d^{2} y(x)}{d x^{2}}=\frac{\phi^{\prime}(x)-f^{\prime}(y(x))\left(\frac{\phi(x)}{f(y(x))}\right)^{2}}{f(y(x))}\) Hence, we have:

\[ \frac{d^{2} y(x)}{d x^{2}} \geq 0 \Leftrightarrow \frac{\phi^{\prime}(x)}{(\phi(x))^{2}} \geq \frac{f^{\prime}(y(x))}{(f(y(x)))^{2}} \tag{12} \]

The next corollary gives an equivalent condition for convexity of the mappings (F_y⁻¹ ∘ Φ)(·), and is an immediate consequence of Lemma 4.3.

Corollary 4.4

If F_y(Y) is a continuous CDF, and y(x) = (F_y⁻¹ ∘ Φ)(x), then y(x) is convex, for all x IFF:

\[ \frac{d}{d x} \log [\phi(x)] \geq \frac{d}{d x} \log [f(y(x))] \tag{13} \]

for all x where f_y′(y(x)) denotes the derivative of f_y(y) WRT y, evaluated at y(x), and Φ(x) is the standard normal CDF, and correspondingly, φ(x) is the standard normal density.

Proof. Since (x) 0 we have that: \(\frac{\phi^{\prime}(x)}{\phi^{2}(x)} \geq\)\(\frac{f_{y}^{\prime}(y(x))}{f_{y}^{2}(y(x))} \Leftrightarrow \frac{\phi^{\prime}(x)}{\phi(x)} \geq \frac{f_{y}^{\prime}(y(x))}{f_{y}(y(x))} \frac{\phi(x)}{f_{y}(y(x))} \Leftrightarrow\)\(\Leftrightarrow \frac{d}{d x} \log [\phi(x)] \geq \frac{d}{d x} \log [f(y(x))]\) where the last inequality follows, since:

\[ \begin{aligned} \frac{f_{y}^{\prime}(y(x))}{f_{y}(y(x))} \frac{\phi(x)}{f_{y}(y(x))} & =\frac{d \log [f(y(x))]}{d y(x)} \cdot \frac{d y(x)}{d x} \\ & =\frac{d}{d x} \log [f(y(x))] \end{aligned} \]

which follows from equation (12). So, in summary, if y(x) (F_y⁻¹)(x), where F_y(Y) is a continuous CDF, and (x) is the standard normal CDF, then either of the following equivalent conditions imply the convexity of y(x), for all x:

\(\frac{\phi^{\prime}(x)}{\phi^{2}(x)} \geq \frac{f_{y}^{\prime}(y(x))}{f_{y}^{2}(y(x))} \text {, or }\)
\(\frac{d}{d x} \log [\phi(x)] \geq \frac{d}{d x} \log [f(y(x))]\)

Before investigating the convexity of the mappings y(x) = (F_y⁻¹ ∘ Φ)(x), when F is a common loss distribution, we pause to consider the interpretation of the preceding results. We focus on the \(2^{nd}\) condition, since it appears to be more amenable to interpretation. By integrating from x = 0 to x = x′, and recalling that y(x) sends the percentiles of φ(x) to the corresponding percentiles of f(y), and in particular, that y(0) is the median of f(y), the \(2^{nd}\) condition implies:

\[ f_{y}\left(y\left(x^{\prime}\right)\right) \leq f_{y}\left(y_{m}^{f}\right) e^{-\frac{x^{\prime 2}}{2}} \quad \text { for all } x^{\prime}>0 \tag{14} \]

Similarly,

\[ f_{y}\left(y\left(x^{\prime}\right)\right) \geq f_{y}\left(y_{m}^{f}\right) e^{-\frac{x^{\prime 2}}{2}} \quad \text { for all } x^{\prime}<0 \tag{15} \]

where y^f_m denotes the median of f(y). The first condition shows that f_y(y(x)), as a function of x, decays, at least, proportionally to \(e^{-\frac{x^{2}}{2}}\) as x grows large. Regarding the \(2^{nd}\) condition, corresponding to the left-hand tail, we consider the case when F has positive support. The condition given in equation (15) is surely satisfied if f(y) is monotonically decreasing, as is the case for the exponential distribution. However, even if f(y) is uni-modal, with f(0) = 0, as is the case for the Gamma distribution with scale parameter α > 1, the \(2^{nd}\) condition requires that, if y(x) = c 0 (c > 0), then the corresponding x = y⁻¹(c) 0 is sufficiently small that f_y(y^f_m)\(\exp \left[-\frac{\left(y^{-1}(c)\right)^{2}}{2}\right]\) is, nonetheless, less than f_y(c).

5. Transmutation mappings

In the previous section it was seen that convexity of the mappings; \(\left(F_1^{-1} \circ \Phi\right)(\cdot), \ldots \ldots,\left(F_{n-1}^{-1} \circ \Phi\right)(\cdot)\), \(\left(F_Y^{-1} \circ \Phi\right)(\cdot)\) ensures that the linear approximation to copula regression will underestimate exact copula regression. In fact, the investigation of the mappings \(\left(F^{-1} \circ \Phi\right)(\cdot)\) dates back at least to 1937 when Cornish-Fisher (C-F) formed polynomial expansions to approximate the quantiles of a given non-normal distribution function, \(F\), in terms of the quantiles of the standard normal distribution. More specifically, if \(Y\) is a \(R V\) with distribution \(F_y\), and \(X_\alpha\) is the \(\alpha^{\text {th }}\) quantile of the standard normal distribution, then the \(\alpha^{t h}\) quantile of \(Y\) can be approximated by:

\[ \begin{aligned} Y_{\alpha}= & m+\sigma\left(X_{\alpha}+\frac{1}{6} \frac{\kappa_{3}}{\sigma_{3}}\left(X_{\alpha}^{2}-1\right)+\frac{1}{24} \frac{\kappa_{4}}{\sigma_{4}}\left(X_{\alpha}^{3}-3 X_{\alpha}\right)\right. \\ & \left.-\frac{1}{36}\left(\frac{\kappa_{3}}{\sigma_{3}}\right)^{2}\left(2 X_{\alpha}^{3}-5 X_{\alpha}\right)+\cdots\right) \end{aligned} \tag{16} \]

where \(m, \sigma\) and \(\kappa_i\) denote the mean, standard deviation, and \(i^{\text {th }}\) order cumulant of the \(R V Y\). It is important to note that C-F expansions are only approximations. Moreover, there are problems with C-F expansions, including the fact that introduction of additional terms does not always lead to a more accurate approximation. In fact, the likelihood of negative density values increases as higher-order terms are added to the series. As a result, C-F expansions are not useful for investigating the convexity of \(\left(F_y^{-1} \circ \Phi\right)(\cdot)\). However, during the 2013 Actuarial Research Conference it was pointed out by Vytaras Brazauskas, from the University of Wisconsin at Milwaukee, that the mappings \(\left(F_y^{-1} \circ \Phi\right)(\cdot)\) were recently studied by Shaw and Buckley (2007), and later by Steinbrecher and Shaw (2008), Shaw and Brickman (2010), and Munir and Shaw (2012). Shaw and Buckley (2007) dubbed the mappings \(\left(F_y^{-1} \circ \Phi\right)(\cdot)\) transmutation mappings, and noted that these mappings, essentially, turn samples from one distribution \((\Phi)\), into samples from another \(\left(F_y\right)\). Hence, a practical and precise representation of transmutation mappings has the potential to produce very efficient sampling algorithms, via the leveraging of existing samples to create samples from the desired distribution(s). In addition to having obvious application to sampling theory, especially copula-based sampling, Shaw and Buckley also note that transmutation mappings have utility within hypercubefilling quasi-Monte-Carlo (QMC) methods.

More relevant to our purposes, Shaw and Buckley claim that, prior to their research, there had been few, or no, analytical results published on transmutation maps, outside the asymptotic domain. Shaw and Buckley ended this drought through an investigation of the differential equations that transmutation mappings obey. They then used these results to form new transmutation map expansions, based on the power series solutions to these non-linear ODEs. Though the results of Shaw and Buckley represent a step forward in the analytical analysis of transmutation mappings, and are based on differential equations, their results are not, however, sufficiently strong enough to quantify properties of transmutation mappings, such as convexity. In fact, to the authors knowledge, there has been no analytical investigation into the higher-order mathematical properties, such as convexity, of transmutation mapping. This statement is consistent with those of Shaw and Buckley, who go on to, quite rightly, point out that the dearth of such research is likely due to the fact that many common statistical distributions, and their inverses, including the ubiquitous normal distribution, involve rather intractable mathematical special functions. As a result, an attempt to investigate the convexity of transmutation mappings is tantamount, in many cases, to proving results about the composition of two mathematical special functions, both of which may be very intractable, on their own. This is especially the case when one of the distributions is the Gamma distribution, or the regularized incomplete gamma, as it is known within mathematical physics. Regardless, the final section of this paper contains a proof of the convexity of this particular transmutation mapping.

6. Results under specific loss distributions

We now investigate the convexity of transmutation mappings when the distribution F is among those commonly used for severity, or size-of-loss, modeling. The authors suspect that the results of this section will be useful, even outside the statistical community. In particular, we feel it is likely that researchers in finance and even applied mathematics will find interest in these results. We first present an example where convexity of the transmutation mapping, almost trivially, follows. Since each distribution F under consideration is a loss, or severity, distribution, each distribution F has support on the positive reals.

Proposition 6.1

(lognormal distribution) If F_y(y) is a lognormal distribution, with parameters (μ,σ), and Φ(x) (φ(x)) is the CDF (density) of the standard normal distribution, then y(x) = (F_y⁻¹ ∘ Φ)(x) is convex, for all x.

Proof. Since (F_y⁻¹ ∘ Φ)(x), and its inverse (Φ⁻¹ ∘ F_y)(y) are both increasing functions, the convexity of (F_y⁻¹ ∘ Φ)(x) is equivalent to the concavity of (Φ⁻¹ ∘ F_y)(y). Hence, it suffices to show that (Φ⁻¹ ∘ F_y)(y) is concave. Since Y ∼ LN(μ, σ), we have that:

\[ F_{y}(y)=\Phi\left(\frac{\ln (y)-\mu}{\sigma}\right) \]

Hence: (Φ⁻¹ ∘ F_y)(y) = \(\Phi^{-1}\left[\Phi\left(\frac{\ln (y)-\mu}{\sigma}\right)\right]=\frac{\ln (y)-\mu}{\sigma}\) and so, for all y:

\[ \frac{d^{2}}{d y^{2}}\left(\Phi^{-1} \circ F_{y}\right)(y)=-\frac{1}{\sigma y^{2}} \leq 0 \tag{17} \]

Next, we consider another distribution that is commonly used for severity, or loss-size, modeling: the Pareto distribution. In particular, we consider the Pareto Type II distribution, or Lomax distribution, to which it is sometimes referred, with CDF parameterized as follows, for y > 0, and α, θ > 0:

\[ F(y)=1-\left(\frac{\theta}{y+\theta}\right)^{\alpha} \tag{18} \]

Unlike the lognormal case, convexity of the transmutation mapping is not as easily verified when F is a Pareto distribution.

Proposition 6.2

(Pareto distribution) If \(F_y(y)\) is a Pareto distribution, with shape parameter \(\alpha\), and scale parameter \(\theta\), and \(\Phi(x)(\phi(x))\) is the CDF (density) of the standard normal distribution, then \(y(x)=\left(F_y^{-1} \circ \Phi\right)(x)\) is convex, for all \(x\).

Proof. First, consider the case where x 0. Note that, since φ(x) is the normal density, the sufficient condition for convexity in Corollary 4.4, namely, equation (13), can be written: −x ≥ \(\frac{d}{d x}\)log[f(y(x))], or:

\[ -x-\frac{d}{d x} \log [f(y(x))] \geq 0 \tag{19} \]

However, since f(y) is monotonically decreasing, for all y, and y(x), and log(x) are both increasing, we have that: \(\frac{d}{d x}\)log[f(y(x))] 0, for all y, and hence equation (19) is satisfied, for x 0. Now, consider the case where x > 0. Since F_y⁻¹(τ) = \(\theta(1-\tau)^{-\frac{1}{\alpha}}-\theta\), we have that:

\[ y(x)+\theta=F_{y}^{-1}(\Phi(x))+\theta=\theta(1-\Phi(x))^{-\frac{1}{\alpha}} \tag{20} \]

so we have that \(f_{y}(y(x))=\frac{\alpha}{\theta}[1-\Phi(x)]^{\frac{\alpha+1}{\alpha}} .\) Therefore, after some routine computation and simplification, and using equation (20) for the last equality, we have:

\[ \begin{aligned} -\frac{d}{d x} \log \left[f_{y}(y(x))\right] & =\frac{\alpha+1}{y(x)+\theta} \cdot \frac{\phi(x)}{f_{y}(y(x))} \\ & =\frac{\alpha+1}{\alpha} \phi(x)[1-\Phi(x)]^{-1} \end{aligned} \tag{21} \]

Again, by equation (19), we only need to show that: x ≤ −\(\frac{d}{d x}\)log[f(y(x))], for x > 0, which by equation (21) is equivalent to: x ≤ \(\frac{\alpha+1}{\alpha}\) φ(x)[1 − Φ(x)]⁻¹, or for x > 0:

\[ \frac{\alpha+1}{\alpha} \cdot \frac{1}{x} \geq \frac{1-\Phi(x)}{\phi(x)} \tag{22} \]

But, we note that the quantity m(x) = (1 − Φ(x))/φ(x) is the well-known Mill’s Ratio for the normal distribution, which has been the focus of a good amount of research. In particular, Baricz (2008, 2010, 2012) points out that in 1941 R. D. Gordon proved that the Mill’s ratio, for the normal distribution obeys the following inequalities, for x > 0:

\[ \frac{x}{x^{2}+1}<\frac{1-\Phi(x)}{\phi(x)}<\frac{1}{x} \tag{23} \]

Hence, since \(\frac{\alpha+1}{\alpha}\) > 1 we have, for x > 0, and for all α > 0, that:

\[ \frac{\alpha+1}{\alpha} \cdot \frac{1}{x} \geq \frac{1}{x} \geq \frac{1-\Phi(x)}{\phi(x)} \tag{24} \]

We now turn our attention to the Gamma distribution, or the regularized incomplete gamma function, as it is known within the applied mathematics and physics communities. The incomplete gamma function appears ubiquitously within applied mathematics, and is related to many other mathematical special functions, including the Confluent Hypergeometric functions, Bessel functions Tricomi (1950a, 1950b), the Legendre and Laguerre polynomials, Kummer’s functions, the Gaussian error function, as well as the exponential integrals. For this reason, the theory of the incomplete gamma function, has been (and continues to be) of keen interest to many mathematicians. The Italian mathematician Tricomi is one such example. In fact, in the paper The Incomplete Gamma Function Since Tricomi, Gautschi (1998) remarks that “the incomplete gamma function held a special fascination for him (Tricomi), as he was fond of calling it, affectionately, the Cinderella of special functions.” More relevantly, Gautshi goes on to state that “Monotonicity, convexity, and higher monotonicity results abound for the gamma function, but seem to be scarce for the incomplete gamma function.” Moreover, despite the author’s assiduous attempt to find inequality results which are tight enough across a sufficiently broad range of the domain, to facilitate a direct proof of the convexity of the incomplete gamma function composed with the inverse error function, the breadth of published inequalities was, once again, found to be lacking. Some relevant investigations of the regularized incomplete gamma, and other special functions, include Alm (2003), Alzer (1997, 2005), Berg and Pedersen (2008), Carlitz (1963), Cerone and Dragomir (2008), Gautschi (1998), Short (2013), and Strecok (1968).

The authors did have some success using various approaches, including the use of approximations to (Φ⁻¹ ∘ F_y)(y) involving inverse hyperbolic trigonometric functions, but this necessitated the breaking-up of the domain, and additional work would have been necessary to make the result rigorous over the full domain, Y. More importantly, this approach required a constructive argument, which was quite protracted. As a result, the focus of the authors returned to obtaining the result through contradiction. Finally, due to an éclair de génie, experienced by the third author, a surprisingly concise and elegant version of the proof was made possible. To the author’s knowledge, this analytical result is unique within the study of special functions.

Proposition 6.3

(Gamma distribution) If \(F_y(y)\) is a Gamma distribution, with shape parameter \(\alpha\), and scale parameter \(\theta\), and \(\Phi(x)(\) resp. \(\phi(x))\) is the CDF (resp. density) of the standard normal distribution, then \(y(x)=\left(F_y^{-1} \circ \Phi\right)(x)\) is convex, for all \(x\), and for all values of the shape parameter \(\alpha\).

Due to the complexity of the proof of Proposition 6.3, especially when α > 1, the proof is relegated to the appendix, and is proceeded by several supporting lemmas.

7. Conclusion

Parsa and Klugman (2011) proposed a generalization of ordinary least squares regression which is better suited to the modeling of actuarial data sets, which often possess heavy-tailed marginal distributions and non-linear relationships between variables. However, not long after the introduction of copula regression, a surprisingly simple approximation was suggested. In this paper we have presented and investigated this approximation, which takes the form of an OLS regression, under particular transformations of the variables. Next, we described how this linear approximation to copula regression can produce estimates which are close to those from exact copula regression. However, we present a set of criteria which guarantee that the estimates from the linear approximation will underestimate those from exact copula regression. Further, it is described how the main driver of the discrepancy, or bias, between copula regression and its linear approximation is a consequence of the use of transformations of the variables. Moreover, we explain how this discrepancy is not due to the form of the transformations used within the linear approximation to copula regression, but rather such a discrepancy will likely arise whenever transformations of the variables are used within OLS regression. This realization has consequences well beyond the specific models investigated in this paper, and serves as a salient reminder of the dangers of using transformations of variables within OLS regression. Finally, armed with the aforementioned sufficient conditions for underestimation, we continued by investigating which of the common loss distributions satisfy these criteria. In particular, we were able to prove that the lognormal, Pareto, and gamma distributions all satisfy these criteria and in particular, satisfy these criteria for all parameter values. Hence, these results allow the partitioner to determine when the estimates from the linear approximation to copula regression will underestimate the true values. Further, if OLS regression involving transformations is used within the reserving, capital modeling, or even pricing processes of a firm, these results can aid the practitioner in avoiding the understatement of reserves and even insolvency.

Acknowledgments

This work was sponsored by the Casualty Actuarial Society (CAS), the Actuarial Foundation’s research committee, and the Committee on Knowledge Extension Research (CKER) of the Society of Actuaries (SOA). In addition to the CAS, the Actuarial Foundation, and the CKER, the authors wish to extend thanks to Alice Underwood, Vice-President, Research and Development, CAS; David Core, Director of Professional Education and Research, CAS; Curtis Huntington, FSA, MAAA, FCA, MSPA (1942–2013) Former Chair, the Actuarial Foundation’s Research Committee, the CEKR of the SOA, and the Actuarial Foundation of Canada; Cynthia MacDonald, Senior Experience Studies Actuary, SOA; and Eileen Streu, Executive Director, The Actuarial Foundation.

Also, the authors wish to extend special thanks to the following individuals who provided guidance, feedback, and motivations for this research: Stuart Klugman, Staff Fellow, Education, SOA; and Thomas Struppeck, Longhorn Analytics LLC, and The University of Texas at Austin.

A Linear Approximation to Copula Regression

Abstract

1. Introduction

2. Copula regression

Definition

3. A linear approximation to copula regression

Model 2

4. Sufficient conditions for bias

Lemma 4.1

Theorem 4.2

Lemma 4.3

Corollary 4.4

5. Transmutation mappings

6. Results under specific loss distributions

Proposition 6.1

Proposition 6.2

Proposition 6.3

7. Conclusion

Acknowledgments

References

Appendix A

Lemma A.1

Lemma A.2

Definitions A.3

Proof of Proposition 6.3

A Linear Approximation to Copula Regression

Abstract

1. Introduction

2. Copula regression

Definition

3. A linear approximation to copula regression

Model 2

4. Sufficient conditions for bias

Lemma 4.1

Theorem 4.2

Lemma 4.3

Corollary 4.4

5. Transmutation mappings

6. Results under specific loss distributions

Proposition 6.1

Proposition 6.2

Proposition 6.3

7. Conclusion

Acknowledgments

References

Appendix A

Lemma A.1

Lemma A.2

Definitions A.3

Proof of Proposition 6.3

This website uses cookies