Jeong, Himchan, Emiliano A. Valdez, Jae Youn Ahn, and Sojung Carol Park. 2021. “Generalized Linear Mixed Models for Dependent Compound Risk Models.” Variance 14 (1).
• Table 1. Observable policy characteristics used as covariates
• Table 2. Percentage and number of claims by count and year
• Table 3. Average severity (AvgSev) by claim count and calendar year
• Figure 1. Frequency and average severity by calendar year
• Figure 2. Graphical relationship of frequency and average severity, per policyholder
• Table 4. Goodness-of-fit test for the frequency component
• Figure 3. log-QQ plots of fitting gamma to average severity for each calendar year
• Table 5. Regression estimates of the negative binomial model for frequency
• Table 6. Regression estimates of the gamma model for average severity
• Table 7. Regression estimates for the aggregate loss models based on Tweedie
• Table 8. Validation measures for the five models
• Figure 4. The Lorenz curve and the Gini index values for the five models

## Abstract

In ratemaking, calculation of a pure premium has traditionally been based on modeling frequency and severity in an aggregated claims model. For simplicity, it has been a standard practice to assume the independence of loss frequency and loss severity. In recent years, there has been sporadic interest in the actuarial literature exploring models that depart from this independence. In this paper, the authors extend the work of Garrido, Genest, and Schulz (2016), which uses generalized linear models (GLMs) that account for dependence between frequency and severity and simultaneously incorporate rating factors to capture policyholder heterogeneity. In addition, they quantify and explain the contribution of the variability of claims among policyholders through the use of random effects using generalized linear mixed models (GLMMs). The authors calibrated their model using a portfolio of auto insurance contracts from a Singapore insurer where they observed claim counts and amounts from policyholders for a period of six years. They compared their results with the dependent GLM considered by Garrido, Genest, and Schulz; Tweedie models; and the case of independence. The dependent GLMM shows statistical evidence of positive dependence between frequency and severity. Using validation procedures, the authors find that the results demonstrate a superior model when random effects are considered within a GLMM framework.

Himchan Jeong and Emiliano A. Valdez were supported by the CAE Research Grant on Applying Data Mining Techniques in Actuarial Science funded by the Society of Actuaries (SOA).

Sojung Carol Park acknowledges support from the Institute of Management Research at Seoul National University.

Accepted: May 21, 2018 EDT

# Appendices

## Appendix A. The development of the log-likelihood equations

Based on our observed data, from equations (2.13) and (2.6), the likelihood can be expressed as

\begin{align} L = &\prod_{i}^{}\int\int\prod_{t}^{}f(n_{{it}},{\overline{c}}_{{it}}|b,u)dF_{b}dF_{u} \\ = &\prod_{i}^{}\left( \int\prod_{t}^{}f_{N}(n_{{it}}|b)dF_{b} \right) \\ &\times \left( \int\prod_{t}^{}f_{\overline{C}|N}({\overline{c}}_{{it}}|u,n_{{it}})dF_{u} \right). \end{align}

Thus, the log-likelihood can be expressed as

\begin{align} \mathcal{l} = & \log L \\ = &\sum_{i}^{}\Bigl( \log \int_{}^{}{\prod_{t}^{}f_{N}(n_{{it}}|b)dF_{b}} \\ &+ \log \int_{}^{}{\prod_{t}^{}f_{\overline{C}|N}({\overline{c}}_{{it}}|u,n_{{it}})dF_{u}} \Bigr) \\ = & \sum_{i}^{}\left( \log \int_{}^{}{\prod_{t}^{}f_{N}(n_{{it}}|b)dF_{b}} \right) \\ &+ \sum_{i}^{}\left( \log \int_{}^{}{\prod_{t}^{}f_{\overline{C}|N}({\overline{c}}_{{it}}|u,n_{{it}})dF_{u}} \right) \\ \end{align}

$\mathcal{l}_{N} = \sum_{i}^{}\left( \log \int_{}^{}{\prod_{t}^{}f_{N}(n_{{it}}|b)dF_{b}} \right),$

and

$\mathcal{l}_{\overline{C}|N} = \sum_{i}^{}\left( \log \int_{}^{}{\prod_{t}^{}f_{\overline{C}|N}({\overline{c}}_{{it}}|u,n_{{it}})dF_{u}} \right).$

We take the partial derivatives of the log-likelihood functions and set to 0:

$\frac{\partial\mathcal{l}_{N}}{\partial\alpha} = 0\ \text{for}\ k = 1,\ldots,p,$

$\frac{\partial\mathcal{l}_{N}}{\partial\sigma_{b}} = 0,$

$\frac{\partial\mathcal{l}_{N}}{\partial\theta} = 0,$

$\frac{\partial\mathcal{l}_{N}}{\partial\beta} = 0\ \text{for}\ k = 1,\ldots,p, \text{ and }$

$\frac{\partial\mathcal{l}_{N}}{\partial\sigma_{u}} = 0.$

The results yield to the (2p + 3) estimating equations written in Section 2.

## Appendix B. Details of the computation of the mean and variance of the compound sum

In this appendix, we provide the details of the derivation for the expression of the unconditional mean and variance of the aggregate claim as defined by $$S = N\overline{C}$$ according to our GLMM specification. For simplicity, here we drop all the subscripts. Using the notation that is conventional for the GLM framework, we define $$\nu$$ and $$\mu$$ so that $$\nu = g_{N}^{- 1}(\mathbf{x}'\alpha + z'b) = \mathbb{E} \left\lbrack N | \mathbf{x} \right\rbrack$$ and $$\mu$$ $$= g_{C}^{- 1}(\mathbf{x}'\beta$$ $$+ \theta n$$ $$+ z'u)$$ $$= \mathbb{E}\left\lbrack \overline{C}|N,\mathbf{x} \right\rbrack,$$ using the link function $$g_{N}( \cdot )$$ for the frequency and $$g_{C}( \cdot )$$ for the average severity, respectively.

Therefore, in general, we can derive explicit formulas for the unconditional mean and variance of the aggregate claims as follows:

\begin{align} \mathbb{E}\left\lbrack S|\mathbf{x} \right\rbrack & = \mathbb{E}\left\lbrack N\overline{C}|\mathbf{x} \right\rbrack = \mathbb{E}\left\lbrack N\mathbb{E}\left\lbrack \overline{C}|N,u \right\rbrack|\mathbf{x} \right\rbrack \\ & = \mathbb{E}\left\lbrack Ng_{C}^{- 1}(\mathbf{x}'\beta + \theta n + z'u)|\mathbf{x} \right\rbrack, \end{align} \tag{B.1}

and

\begin{align} Var(S|\mathbf{x}) = &Var(\mathbb{E}\left\lbrack N\overline{C}|N,\mathbf{x} \right\rbrack) \\ &+ \mathbb{E}\left\lbrack Var(N\overline{C}|N,\mathbf{x}) \right\rbrack \\ = &Var(N\mathbb{E}\left\lbrack \overline{C}|N,\mathbf{x} \right\rbrack|\mathbf{x}) \\ &+ \mathbb{E}\left\lbrack N^{2}Var(\overline{C}|N,\mathbf{x})|\mathbf{x} \right\rbrack \\ = &Var\left( Ng_{C}^{- 1}\left( \mathbf{x}'\beta + \theta n + z'u \right) \middle| \mathbf{x} \right) \\ &+ \mathbb{E}\bigl\lbrack N^{2}\tau V(g_{C}^{- 1}(\mathbf{x}'\beta + \theta n + z'u)) \\ &|\mathbf{x} \bigr\rbrack. \end{align} \tag{B.2}

Note that to simplify this, we can derive an expression for the unconditional mean and variance with our two-part dependent frequency severity GLMM:

$N|b \sim \ \text{indep.}\ \text{NB}\left( \nu e^{b},r \right)\ \text{with}\ b \sim N\left( 0,\sigma_{b}^{2} \right),$

and

$\overline{C}|N,u \sim \ \text{indep.}\ \text{gamma}(\mu e^{u},\phi/n)\ \text{with}\ u \sim N(0,\sigma_{u}^{2}).$

We additionally assume log-link functions $$g_{N}(\mu) = g_{C}(\mu) = \log \mu$$ and assume that z = 1. For average severity, conditional on N, model specified above, we added $${θn}$$ in the linear predictor. Thus, we have $$\nu = \exp (\mathbf{x}'\alpha)$$and $$\mu = \exp (\mathbf{x}'\beta).$$

For the unconditional mean, we have

\begin{align} \mathbb{E}\left\lbrack S|\mathbf{x} \right\rbrack = &\mathbb{E}\left\lbrack N\overline{C}|\mathbf{x} \right\rbrack \\ = &\mathbb{E}\left\lbrack N\mathbb{E}\left\lbrack \overline{C}|N,u \right\rbrack|\mathbf{x} \right\rbrack \\ = &\mathbb{E}\left\lbrack {Nμ}e^{n\theta + u}|\mathbf{x} \right\rbrack \\ = &\mu\mathbb{E}\left\lbrack Ne^{{nθ}}|\mathbf{x} \right\rbrack\mathbb{E}\left\lbrack e^{u}|\mathbf{x} \right\rbrack \\ = &\mu\mathbb{E}\left\lbrack M'_{N|b,\mathbf{x}}(\theta) \right\rbrack e^{\sigma_{u}^{2}/2} \\ = &\mu\mathbb{E}\left\lbrack \nu\lbrack 1 - (\nu e^{b}/r)(e^{\theta} - 1)\rbrack^{- r - 1} \right\rbrack \\ &\times e^{\sigma_{u}^{2}/2 + \theta} \\ = &\mu\nu\mathbb{E}\left\lbrack \lbrack 1 - (\nu e^{b}/r)(e^{\theta} - 1)\rbrack^{- r - 1} \right\rbrack \\ &\times e^{\sigma_{u}^{2}/2 + \theta}, \end{align} \tag{B.3}

where we have used the following results, which can be immediately deduced: $$M_{N|b,\mathbf{x}}(t) = \lbrack 1 - (\nu e^{b}/r)(e^{t} - 1)\rbrack^{- r}$$ and $$\mathbb{E}\left\lbrack Ne^{{nt}}|b,\mathbf{x} \right\rbrack = M'_{N|b,\mathbf{x}}(t) = \nu e^{b + t}\lbrack 1 - (\nu e^{b}/r)(e^{t} - 1)\rbrack^{- r - 1}.$$

Note that the expectation in the final line above is with respect to the random effect b. If we therefore set b = 0 and u = 0, this leads us to the dependent GLM without random effects. In this case, we have $$\mathbb{E}\left\lbrack S|\mathbf{x} \right\rbrack = \mu\mathbb{E}\left\lbrack M'_{N|\mathbf{x}}(\theta) \right\rbrack,$$ which gives us precisely what is found in Garrido, Genest, and Schulz (2016).

To derive the unconditional variance, we first note that $$\mathbb{E}\left\lbrack S^{2}|\mathbf{x} \right\rbrack$$ $$= \mathbb{E}\left\lbrack N^{2}{\overline{C}}^{2}|\mathbf{x} \right\rbrack$$ $$= \mathbb{E}\left\lbrack N^{2}\mathbb{E}\left\lbrack {\overline{C}}^{2}|N,u \right\rbrack|\mathbf{x} \right\rbrack,$$ and because $$\mathbb{E}\left\lbrack \overline{C}|N,u,\mathbf{x} \right\rbrack$$ $$= \mu e^{n\theta + u}$$ and $$Var(\overline{C}|N,u,\mathbf{x})$$ $$= \phi\mu^{2}e^{2(n\theta + u)}/n,$$ we can get

$\mathbb{E}\left\lbrack {\overline{C}}^{2}|N,u,\mathbf{x} \right\rbrack = (\phi/n + 1)\mu^{2}e^{2(n\theta + u)},$

\begin{align} \mathbb{E}\left\lbrack N^{2}\mathbb{E}\left\lbrack {\overline{C}}^{2}|N,u \right\rbrack|\mathbf{x} \right\rbrack = &\mu^{2}\mathbb{E}\left\lbrack (\phi n + n^{2})e^{2(n\theta + u)}|\mathbf{x} \right\rbrack \\ = &\mu^{2}(\phi\mathbb{E}\left\lbrack M'_{N|b,\mathbf{x}}(2\theta) \right\rbrack \\ &+ \frac{1}{4}\mathbb{E}\left\lbrack M''_{N|b,\mathbf{x}}(2\theta) \right\rbrack)e^{2\sigma_{u}^{2}}, \end{align}

and

\begin{align} \mathbb{E}\left\lbrack N^{2}e^{{nt}}|b,\mathbf{x} \right\rbrack = &M''_{N|b,\mathbf{x}}(t) \\ = &\nu^{2}e^{2b + 2t}(1 + 1/r)\lbrack 1 - (\nu e^{b}/r)(e^{t} - 1)\rbrack^{- r - 2} \\ &+ M'_{N|b,\mathbf{x}}(t). \end{align}

Finally, combining the expressions for $$\mathbb{E}\left\lbrack S|\mathbf{x} \right\rbrack$$ and $$\mathbb{E}\left\lbrack S^{2}|\mathbf{x} \right\rbrack,$$ we have

\begin{align} Var(S|\mathbf{x}) = &\mathbb{E}\left\lbrack S^{2}|\mathbf{x} \right\rbrack - \left( \mathbb{E}\left\lbrack S|\mathbf{x} \right\rbrack \right)^{2} \\ = &\mu^{2}e^{\sigma_{u}^{2}}(\phi\mathbb{E}\left\lbrack M'_{N|b,\mathbf{x}}(2\theta) \right\rbrack e^{\sigma_{u}^{2}} \\ &+ \frac{1}{4}\mathbb{E}\left\lbrack M''_{N|b,\mathbf{x}}(2\theta) \right\rbrack e^{\sigma_{u}^{2}} \\ &- \mathbb{E}\left\lbrack M'_{N|b,\mathbf{x}}(\theta) \right\rbrack^{2}) \\ = &\mu^{2}e^{\sigma_{u}^{2} + 2\theta}(\phi\mathbb{E}\bigl\lbrack \nu e^{b}\lbrack 1 - (\nu e^{b}/r) \\ &\times (e^{2\theta} - 1)\rbrack^{- r - 1} \bigr\rbrack e^{\sigma_{u}^{2}} \\ &+ \mathbb{E}\bigl\lbrack \nu^{2}e^{2b + 2\theta}(1 + 1/r) \\ &\times \lbrack 1 - (\nu e^{b}/r)(e^{2\theta} - 1)\rbrack^{- r - 2} \bigr\rbrack e^{\sigma_{u}^{2}} \\ &+ \mathbb{E}\bigl\lbrack \nu e^{b}\lbrack 1 - (\nu e^{b}/r) \\ &\times (e^{2\theta} - 1)\rbrack^{- r - 1} \bigr\rbrack e^{\sigma_{u}^{2}} \\ &- \mathbb{E}\bigl\lbrack \nu e^{b}\lbrack 1 - (\nu e^{b}/r)\\ &\times (e^{\theta} - 1)\rbrack^{- r - 1} \bigr\rbrack^{2}). \end{align} \tag{B.4}

Note that if we again set b = 0 and u = 0, we have the dependent GLM without random effects. In this case, we have

\begin{align} Var(S|\mathbf{x}) = &\phi\mathbb{E}\left\lbrack NV_{C|\mathbf{x}}(\mu e^{{θN}}) \right\rbrack \\ &+ \mu^{2}\left\{ \frac{1}{4}\mathbb{E}\left\lbrack M''_{N|\mathbf{x}}(2\theta) \right\rbrack - \mathbb{E}\left\lbrack M'_{N|\mathbf{x}}(\theta) \right\rbrack^{2} \right\}, \end{align}

which clearly corresponds to what is derived in Garrido, Genest, and Schulz (2016).

It is worth noting that if we set $$\theta = 0,$$ in addition to removing the random effects, we end up with the unconditional mean and variance that correspond to the case when frequency and average severity are independent.