1. Introduction
As Lemaire (1998) notes, with respect to automobile insurance, insurers in the United States and Canada tend to employ many a priori variables (observable factors) to classify risks associated with policyholders, whereas in other parts of the world, such as Europe, Asia, Latin America, and Africa, insurers commonly employ a posteriori ratemaking. The common a posteriori ratemaking process can be accomplished via a bonusmalus system (BMS), and such a posteriori ratemaking holds the promise of resolving residual heterogeneity still remaining within risk classes. As noted in Section 1 of Lemaire (1998), although BMSs are not commonly used in North America, that situation could change, as regulatory authorities may prohibit insurers from using certain classification variables and, moreover, a posteriori ratemaking is known to be an efficient way to assess individual risks. Based on the policyholder’s claims experience, such a system discounts the premium as a reward for a claimfree case (bonus) and charges an additional premium in the presence of accidents (malus). Such an a posteriori ratemaking mechanism takes on various forms across countries, but it generally consists of three major components: bonusmalus (BM) levels; transition rules; and BM relativities. Each new policyholder is classified into one of the finite number of risk classes depending on his or her observable risk characteristics, and he or she is also assigned an initial BM level. After each policy period, the BM level moves up or down according to the prespecified transition rules that penalize policyholders based on the number of reported claims and reward those without claims. In a typical BMS, claims experience in the previous policy period is used to determine the transition rule. However, for better separation of consistently and temporarily good (or bad) risks, we can consider capturing claims experience in multiple previous policy periods in the design of the transition rule, as is done in Belgium. Then, the premium is renewed at a rate that is the product of the BM relativity and the base premium. Here the BM relativity is determined by the BM level, which plays a role in the premium adjustment coefficient. In this way, the actual premium charged to policyholders in each BM level can be more closely matched to their risks reflected in the claim number record for the past few policy periods and consequently the claims costs are fairly shared. In the classical BMS, the base premium is often calculated as a product of the mean frequency and the mean severity due to the assumed independence between those two random variables (e.g., Pitrebois, Denuit, and Walhin 2003a; Denuit et al. 2007; Tan et al. 2015). However, the independence assumption is often criticized as some recent insurance reports demonstrate the existence of a significant dependence between claim frequency and severity. In the actuarial literature, various statistical models have been developed in an attempt to capture certain dependency structures between frequency and severity of claims. These include, for example, copula models (Czado et al. 2012; Frees, Lee, and Yang 2016), shared or bivariate random effects models (HernandezBastida, FernandezSanchez, and GomezDeniz 2009; Baumgartner, Gruber, and Czado 2015; Oh, Shi, and Ahn 2020), and twopart models (Shi, Feng, and Ivantsova 2015; Garrido, Genest, and Schulz 2016; Park, Kim, and Ahn 2018; Jeong et al. 2021).
The focus of the paper is twofold. First, we revisit the extension of the BM levels (Lemaire 1995; Pitrebois, Denuit, and Walhin 2003b) and define our longmemory transition rules in a more formal setting. Second, under the proposed transition rules, we consider the optimal relativities not only in a frequencyonly model but also in a dependent collective risk model that allows for dependence between frequency and severity. As an example of the dependent collective model, we consider the copulabased bivariate random effects model as in Oh, Shi, and Ahn (2020), among many choices of shared or bivariate random effects models. Whereas in the classical BMS the policyholder’s BM level is reduced if there is no claim in the past year, in our modified BMS the BM level moves to a lower level when the policyholder has no claim consecutively for years equivalent to the “period of penalty”. A similar case is discussed in Chapter 17 of Lemaire (1985) and Section 5 of Pitrebois, Denuit, and Walhin (2003b), where the transition is made based on the number of consecutive claimfree years. However, the optimal relativities in the present paper are obtained by solving different forms of optimization problems. Under two models—namely, the frequency random effects model and the bivariate random effects model—the optimal relativities and the hypothetical mean squared error are studied. The rest of the paper is organized as follows. In Section 2, we describe two random effects models and give the optimal relativity results under the classical BMS rules. In Section 3, a period of penalty (denoted as “\(pen\)”) is introduced in the transition rules such that the BM level moves downward only when there is no claim for the last consecutive \((1+pen)\) years. Under the modified transition rules, an “augmented” BM level is newly defined so that its transition process has the Markov property. In turn, the optimal relativities are provided under the two different models. Finally, in Section 4, we turn to a data set from the Wisconsin Local Government Property Insurance Fund (e.g., Frees, Lee, and Yang (2016)) to illustrate the impact of the longmemory transition rules on our BMS versus the classical BMS.
2. Model descriptions and basic results
For the \(i\)th policyholder in the \(t\)th policy year, we let \(N_{it}\) be the number of claims and \(\mathbf{Y}_{it}=(Y_{it1}, \ldots, Y_{itN_{it}})\) be the vector of associated claim amounts, where \(Y_{itj}\) is the \(j\)th claim amount. (Note that \(\mathbf{Y}_{it}\) is undefined when \(N_{it}=0.)\) Also, the aggregate claim amount and average claim amount are defined as
\[\begin{align} S_{it}&:=\begin{cases} \sum_{j=1}^{N_{it}}Y_{itj}, & N_{it}>0\\ 0, &N_{it}=0 \end{cases} \quad\quad \hbox{and}\\ \quad\quad M_{it}&:=\begin{cases} \frac{1}{N_{it}}\sum_{j=1}^{N_{it}}Y_{itj}, & N_{it}>0,\\ \hbox{0}, &N_{it}=0, \end{cases}\end{align}\tag{1}\]
respectively. Clearly, these two quantities are directly linked as \(S_{it}=N_{it}M_{it}\).
In the insurance ratemarking process, a priori ratemaking results in risk classification of the (new) policyholder whose past claims history is insufficient for the ratemaking purposes. Using the observed risk characteristics of the policyholders (collected in the row vector \(\mathbf{X}_i\) for the \(i\)th policyholder), their risk classes are determined and the a priori premiums are also obtained. Then, the residual heterogeneity is explained by capturing the unobserved risk characteristics of policyholders (denoted as \(\Theta_i\) for the \(i\)th policyholder) in an a posteriori ratemaking process, and consequently their premium amounts are adjusted based on the claims history. Here, \(\mathbf{X}_i\) and \(\Theta_i\) do not depend on time. In general, the frequency part may be considered enough to construct the BMS under the assumption of independence between frequency and severity. However, recent BMS research in Oh, Shi, and Ahn (2020) reveals that the dependence between frequency and severity plays a critical role in the determination of optimal relativities in the case of significant dependence. In this regard, the superscripts [1] and [2], representing the frequency and severity components, respectively, are introduced to variables and their corresponding realizations. When we are looking only at the frequency component, we drop the notation [1] for convenience. To model the frequency and the severity of claims, we employ a generalized linear model as described in de Jong and Heller (2008) along with the assumption of an exponential dispersion family for the random components (see Appendix A for a description).
2.1. Random effects models
For the unobserved risk characteristics in a posteriori ratemating, let us describe two different models that take into account, first, the randomness of frequency only and, second, both frequency and severity. For convenience, we assume that there are \(\mathcal{K}\) risk classes predetermined at the moment of when the contracts begin. We start with the description of the model for the frequency component only.
Model 1 (frequency model with random effects). First, the weight of the \(k\)th risk class is defined as \[ w_k:= \mathbb{P}(\mathbf{X}=\mathbf{x}_{k}), \qquad k = 1,\ldots, \mathcal{K},\tag{2}\] where \(\mathbf{X}\) is the row vector of the observed risk characteristics of a randomly picked person from the population of policyholders and \(\mathbf{x}_{k}\) is a vector of the observed risk characteristics for the \(k\)th risk class. Note that \(w_k\) can be regarded as the fraction of policyholders with observed risk characteristics \(\mathbf{x}_{k}.\) The a priori premium for the \(i\)th policyholder is determined as \(\Lambda_{i}:=\eta^{1} (\mathbf{X}_{i}\mathbf{\beta})\), where \(\eta(\cdot)\) is a link function, \(\mathbf{X}_i\) is the row vector of observed risk characteristics of this \(i\)th policyholder, and \(\mathbf{\beta}\) is a column vector of regression coefficients in the count regression model estimated from the past data concerning the number of reported claims. Incorporating the unobserved risk component for the frequency,^{[1]} the conditional distribution of the number of claims \(N_{it}\) for the \(i\)th policyholder in the \(t\)th policy year given the observed risk characteristics \(\mathbf{X}_{i}=\mathbf{x}_{i}\) and unobserved risk characteristics \(\Theta_i=\theta_i\) along with the distribution of \(\Theta_i\) is provided in Appendix B.
The following model extends the twopart model (Frees and Valdez 1998; Garrido, Genest, and Schulz 2016) by introducing bivariate random effects to model various types of dependence, such as

dependence among frequencies;

dependence among severities; and

dependence between frequency and severity.
Model 2 (collective risk model with bivariate random effects (Oh, Shi, and Ahn 2020)). Recall that the superscripts [1] and [2] represent, respectively, the frequency and the severity components. As an extension of Model 1, this model takes into account the random component of the claim severity as well for the a priori and a posteriori ratemaking processes. In contrast to (2), we define the weight of the \(k\)th risk class as
\[w_k:= \mathbb{P}({\mathbf{X}^{[1]}=\mathbf{x}_{k}^{[1]}, \mathbf{X}^{[2]}=\mathbf{x}_{k}^{[2]} }), \qquad k = 1,\ldots, \mathcal{K},\]
where \((\mathbf{X}^{[1]},\mathbf{X}^{[2]})\) is the pair of row vectors containing the observed risk characteristics relevant to the frequency and the severity of claims for a randomly picked person from the population of policyholders, and \((\mathbf{x}_k^{[1]},\mathbf{x}_k^{[2]})\) is the corresponding pair for the \(k\)th risk class. The a priori premium for the \(i\)th policyholder can be obtained from the information of the vector \((\Lambda_i^{[1]},\Lambda_i^{[2]})\), which is in turn determined by the observed risks characteristics as \[\Lambda_{i}^{[1]}:=\eta_1^{1} (\mathbf{X}^{[1]}_{i}\mathbf{\beta}^{[1]}) \quad and \quad \Lambda_{i}^{[2]}:=\eta_2^{1}(\mathbf{X}^{[2]}_{i}\mathbf{\beta}^{[2]}),\] where \(\eta_1(\cdot)\) and \(\eta_2(\cdot)\) are link functions, \(\mathbf{X}^{[1]}_i\) and \(\mathbf{X}^{[2]}_i\) are the \(i\)th policyholder’s row vectors of observed risk characteristics for frequency and severity, and \(\mathbf{\beta}^{[1]}\) and \(\mathbf{\beta}^{[2]}\) are column vectors of parameters to be estimated. More importantly, the dependence between claim frequency and severity is also considered as the corresponding unobserved risk characteristics \((\Theta_i^{[1]},\Theta_i^{[2]})\) are modeled via a bivariate copula. More specific distributional descriptions of this model are provided in Appendix C.
It is instructive to note that when Model 2 is restricted to the claim frequency only, it is conceptually equivalent to Model 1.
2.2. Optimal relativities
In the classical BMS framework, common transition rules are such that the BM level is lowered by one for a claimfree year and increased by \(h\) levels per claim, leading to what is referred to as the \(1/+h\) system. Based on such transition rules, the BM level for each policyholder evolves as a (discretetime) Markov chain. Let us denote \(L\) as a random variable representing the BM level (from 0 to \(z\)) for a randomly picked policyholder from the population in a stationary state (independent of time). Its distribution (that is, the proportion of policyholders in each level) is determined by the frequency component as
\[\begin{align} \mathbb{P}(L=\ell) &=\sum_{k=1}^{\mathcal{K}} w_{k} \int \pi_\ell(\lambda_k\theta, \psi) g(\theta){\rm d}\theta, \\\ell&=0, \ldots, {z},\end{align}\tag{3}\]
where \(\pi_\ell(\lambda_k\theta, \psi)\) is the stationary probability that a policyholder with expected frequency \(\lambda_k\theta\) is in level \(\ell\). (It is understood that \(\lambda_k=\eta^{1} (\mathbf{x}_{k}\mathbf{\beta})\), \(g(\cdot)\) and \(\psi\) in (3) are replaced by \(\lambda^{[1]}_k=\eta_1^{1} (\mathbf{x}^{[1]}_{k}\mathbf{\beta}^{[1]})\), \(g_1(\cdot)\) and \(\psi^{[1]}\), respectively, if one considers Model 2.) We shall provide explanations on how to obtain the stationary probabilities at the beginning of Section 3.2 for our more general model in relation to Example 2. The relativity associated with the BM level \(\ell\) is denoted by \({\zeta}(\ell)\). The following lemmas review how to determine the optimal relativity \(\tilde{\zeta}(\ell)\) in two optimization settings under the model assumptions in Section 2.1.
Lemma 1. (Tan et al. 2015) Consider the optimization problem
\[ (\tilde{\zeta}(0), \ldots, \tilde{\zeta}(z)):=\mathop{\mathrm{arg\,min}}_{({\zeta}(0), \ldots, {\zeta}(z))\in \mathbb{R}^{{z+1}}} \mathbb{E}[(\Lambda\Theta\Lambda{\zeta}(L))^2]\tag{4}\] under the frequencyonly Model 1. Because \(\mathbb{E}[N_{it}\Lambda_i,\Theta_i]=\Lambda_i\Theta_i\) (see (B.1)) is the “correct” premium for the \(i\)th policyholder if we did know \(\Theta_i\), one can regard \(\Lambda\Theta=\eta^{1} (\mathbf{X}\mathbf{\beta})\Theta\) as the “correct” premium for a policyholder randomly picked from the population having observed risk characteristics \(\mathbf{X}\) and unobserved risk characteristics \(\Theta\). The optimization (4) amounts to choosing the relativities to minimize the mean squared difference between \(\Lambda\Theta\) and the actual premium charged \(\Lambda{\zeta}(L)\) when a policyholder is in BM level \(L\). Then, the optimal relativities can be analytically calculated as
\[\begin{align}\tilde{\zeta}(\ell)&:=\frac{\mathbb{E}[\Lambda^2 \Theta L=\ell]}{\mathbb{E}[\Lambda^2 L=\ell]} =\frac{ \sum_{k=1}^{\mathcal{K}} w_k \lambda_k^2\int \theta \pi_\ell(\lambda_k \theta,\psi) g(\theta){\rm d}\theta} {\sum_{k=1}^{\mathcal{K}} w_k \lambda_k^2\int \pi_\ell(\lambda_k\theta,\psi)g(\theta){\rm d}\theta} ,\\ \ell&=0, \ldots, {z}.\end{align}\]
Lemma 2. (Oh, Shi, and Ahn 2020) Consider the optimization problem
\[\small\begin{align} (\tilde{\zeta}&(0), \ldots, \tilde{\zeta}(z))\\&:=\mathop{\mathrm{arg\,min}}_{({\zeta}(0), \ldots, \zeta(z))\in \mathbb{R}^{{z+1}}} \mathbb{E}[(\Lambda^{[1]}\Lambda^{[2]}\Theta^{[1]}\Theta^{[2]}\Lambda^{[1]}\Lambda^{[2]}\zeta(L))^2]\end{align}\tag{5}\]
under the frequencyseverity Model 2. As \(\mathbb{E}[S_{it}\Lambda_i^{[1]},\Lambda_i^{[2]},\Theta_i^{[1]},\Theta_i^{[2]}]=\Lambda_i^{[1]}\Lambda_i^{[2]}\Theta_i^{[1]}\Theta_i^{[2]}\) (see Appendix C) is the “correct” premium for the \(i\)th policyholder, \(\Lambda^{[1]}\Lambda^{[2]}\Theta^{[1]}\Theta^{[2]}=\eta_1^{1} (\mathbf{X}^{[1]}\mathbf{\beta}^{[1]}) \eta_2^{1}(\mathbf{X}^{[2]}\mathbf{\beta}^{[2]})\Theta^{[1]}\Theta^{[2]}\) is the “correct” premium for a policyholder randomly picked from the population having observed risk characteristics \((\mathbf{X}^{[1]},\mathbf{X}^{[2]})\) and unobserved risk characteristics \((\Theta^{[1]},\Theta^{[2]})\). The optimization (5) is concerned with choosing the relativities to minimize the mean squared difference between \(\Lambda^{[1]}\Lambda^{[2]}\Theta^{[1]}\Theta^{[2]}\) and the actual premium charged \(\Lambda^{[1]}\Lambda^{[2]}\zeta(L)\) when a policyholder is in BM level \(L\). Then, the optimal relativities can be analytically calculated as
\[\small\begin{aligned} \tilde{\zeta}(\ell):=&~\frac{\mathbb{E}[( \Lambda^{[1]}\Lambda^{[2]})^2 \Theta^{[1]}\Theta^{[2]}L=\ell]}{\mathbb{E}[(\Lambda^{[1]} \Lambda^{[2]})^2L=\ell]}\\ =&~\frac{\sum_{k=1}^{\mathcal{K}} w_{k} (\lambda_{k}^{[1]}\lambda_{k}^{[2]})^2 \int\int \theta^{[1]} \theta^{[2]} \pi_\ell(\lambda_{k}^{[1]}\theta^{[1]},\psi^{[1]}) h(\theta^{[1]},\theta^{[2]}){\rm d}\theta^{[1]}\rm d\theta^{[2]}}{\sum_{k=1}^{\mathcal{K}} w_{k} (\lambda_{k}^{[1]}\lambda_{k}^{[2]})^2 \int \pi_\ell(\lambda_{k}^{[1]}\theta^{[1]},\psi^{[1]})g_1(\theta^{[1]}){\rm d}\theta^{[1]}},\\ \ell=&0, \ldots, {z}.\end{aligned}\]
The optimal relativities in Lemmas 1 and 2 can be applied to various transition rules and BM levels \(\ell\). To calculate optimal relativity in the BMS literature, the Markov property of the transition rules is typically used, and the BM level is often assumed to be stationary (Pitrebois, Denuit, and Walhin 2003a; Denuit et al. 2007; Tan et al. 2015; Oh, Shi, and Ahn 2020). The most common BMS (and also one of the simplest of the BMSs) is the \(1/+h\) system, although other variations are available, such as, for example, in Tan et al. (2015). However, insurers in many countries, including Belgium, Korea, and Singapore, adapt transition rules with a long memory. While such transition rules do not directly possess the Markov property, examples in the literature show how that can be dealt with (Lemaire 1995; Pitrebois, Denuit, and Walhin 2003a). A key technique involves augmentation of BM levels, and the following section generalizes these examples.
We remark that the optimization in (4) or (5) is not the only optimization criterion one can use. Various criteria exist to define the optimality of BM relativities—see, e.g., Denuit et al. (2007) and Tan et al. (2015). More importantly, BM relativities obtained from (4), (5), Denuit et al. (2007), and Tan et al. (2015) are known to create some systematic bias in the prediction of premium. Such systematic bias is called the doublecounting problem, and we refer interested readers to Lemaire (1995), Taylor (1997), and Oh et al. (2020) for the details of the doublecounting problem and its solution.
3. A modified BMS with augmented BM levels
In this section, the number of consecutive claimfree years for a policyholder is taken into account in the transition rules. Assuming there are \(z+1\) BM levels, we let \(L_{t}\in\{0, 1, \ldots, z\}\) be the BM level at time \(t\in\{0,1,\ldots\}\) so that \(L_{t}\) is the BM level applicable for the \((t+1)\)th policy year, that is, from time \(t\) to time \(t+1\). Similar to previous notation, for \(t\in\{1,2,\ldots\}\) we use \(N_{t}\) to denote the number of claims in the \(t\)th policy year.
3.1. Modified transition rules of \(1/+h/pen\)
While the classical BMS usually adopts the socalled \(1/+h\) system for \(h\le z\) as described in Section 2.2, we shall introduce an additional component, namely “\(pen\),” to this system. Specifically, under such a \(1/+h/pen\) system, each reported claim increases the BM level by \(h\) while \(1+pen^*_t\) consecutive claimfree years will result in a decrease of one BM level at time \(t+1\) when moving from the \((t+1)\)th policy year to the \((t+2)\)th. Here \(pen_t^*\) is defined as \[ pen_t^*:=\min\{pen, t\},\tag{6}\] for \(t\in\{0,1,\ldots\}\). This implies that the BM level at time \(t=0\) starts with no penalty, and indeed a new policyholder can have his or her BM level reduced by one every year as long as he or she maintains a noclaim record. (This is also equivalent to letting \(N_0=N_{1}=\ldots=N_{pen}=0\) and saying that one requires \(1+pen\) consecutive claimfree years to decrease the BM level by one.) Note that the classical BMS is retrieved from our extended model by letting \(pen=0\). We further assume that a new policyholder without any driving history belongs to the BM level \(\ell_0\) in the beginning. Then, \(L_{t}\) is mathematically represented as
\[\scriptsize L_t:= \begin{cases} \min\{L_{t1}+h N_{t}, z\}, & N_{t}>0,\\ \max\{L_{t1}1, 0\}, & N_{t}= \ldots =N_{\max\{tpen^*_{t1}, 1\}} =0,\\ L_{t1}, &\hbox{otherwise}, \end{cases}\tag{7}\]
for \(t\in\{1,2,\ldots\}\), with \(L_0=\ell_0\). It is also assumed that the BM relativity \({\zeta}(\ell)\) is applied in the same manner as in the classical \(1/+h\) system. That is, each BM level \(\ell\) is bestowed with BM relativity \({\zeta}(\ell)\). Compared with the classical \(1/+h\) system, an additional \(pen^*_t\) claimfree years are required to reduce the BM level by one, whereas the penalty of climbing \(h\) BM levels per claim remains the same. Such a system is introduced to motivate policyholders to drive carefully by applying a stricter rule for lowering the BM level (reward case) while keeping the same rule with respect to increasing the BM level (penalty case). Insurers have adopted such transition rules in some countries, such as Singapore and Korea.
While the Markov property in the transition probability appearing in certain BMSs is convenient for the analysis of BMSrelated problems (e.g., Taylor 1997; Pitrebois, Denuit, and Walhin 2003a; Denuit et al. 2007), it is obvious that with the current definition of \(L_t\), the transition rules are not Markovian as \(L_t\) depends not only on \(L_{t1}\) but also on the numbers of claims in the past multiple periods. To resolve this issue, as explained in Lemaire (1995) and Pitrebois, Denuit, and Walhin (2003a), fictitious levels can be included to redefine the state space of the BM levels. Subdividing some of the BM levels in order to include the information of the number of claimfree years results in an increase of the number of states versus the classical case. More precisely, for \(\ell=h, \ldots, z\), the BM level \(\ell\) is augmented into \[(\ell)_0, (\ell)_1, \ldots, (\ell)_{pen},\] where the subscript stands for the number of additional claimfree periods (compared with the classical BMS) required to get rewarded, while \(\ell\in \{0, \ldots, h1\}\) is just translated into \((\ell)_0\) without augmentation.^{[2]} Hence, the original \((z+1)\) BM levels in the \(1/+h/pen\) system have been augmented to \([h+(zh+1)\times (pen+1)]\) BM levels. Let us denote the BM level at time \(t\) under the augmented system as \(L_t^*\). Then, it is convenient to define the possible combinations of \((\ell)_a\) in our proposed model (that is, the state space) as
\[\scriptsize \mathcal{A}_{z, pen}:=\{ (\ell)_0\big\vert \ell=0, \ldots, h1\}\cup \{ (\ell)_0, \ldots, (\ell)_{pen} \big\vert \ell=h, \ldots, z \}.\tag{8}\]
Furthermore, suppose that a policyholder who was at the BM level \(L_{t1}^*=(\ell)_a\in \mathcal{A}_{z, pen}\) at time \(t1\) has reported at time \(t\) that \(N_{t}\) accidents occurred during the \(t\)th policy year. Then the new BM level \(L_{t}^*\) at time \(t\) is determined as
\[\small L_t^*:= \begin{cases} ( \max\{\ell1, 0 \})_0, & \hbox{if}\quad N_{t}=0 \,\,\hbox{and}\,\,a=0,\\ ( \ell)_{a1}, & \hbox{if}\quad N_{t}=0 \,\,\hbox{and}\,\,a\neq 0,\\ (\min\{\ell+h N_{t}, z \})_{pen}, & \hbox{if}\quad N_{t}>0,\\ \end{cases}\tag{9}\]
for \(t\in\{1,2,\ldots\}\), where a new policyholder without driving history starts with \(L_0^*=(\ell_0)_0\). The transitions of \(L_t^*\) are now Markovian, as \(L_t^*\) depends on \(L_{t1}^*=(\ell)_a\) and the latest claim number only. Also, we assume that the BM relativity in state \((\ell)_a\in \mathcal{A}_{z, pen}\) under the augmented system defined via (9) is the same as the one in \(\ell\in\{0, 1, \ldots, z\}\) for the original system defined in (7) in the sense that the relativities depend on the BM level \(\ell\) but not the information \(a\) that is artificially introduced to make the transitions Markovian. In other words, under the proposed model one has the relativities \[ \zeta^*\left( (\ell)_a \right):=\zeta(\ell),\qquad (\ell)_a\in \mathcal{A}_{z, pen}.\tag{10}\] We call the BMS with BM levels (8), transition rules (9), and relativities (10) an augmented \(1/+h/pen\) system. The following example illustrates the transition rules with a certain penalty period.
Example 1. Let us consider the \(1/+2/2\) system with \(z=20\). That is, whereas the BM level is to be increased by two per claim, having \(1+pen^*_t\) consecutive claimfree years is to be rewarded by a decrease of one BM level. We assume that a new policyholder without driving history starts at BM level \(L_0=10\) (so \(\ell=10\) at time 0). Given the BM level \(L_{t}\) at time \(t\) and the claim frequency \(N_{t+1}\) in the \((t+1)\)th year, we summarize the next year’s BM level \(L_{t+1}\) according to (7) and alternatively \(L^\ast_{t+1}\) according to (9) in Table 1. (Note that with \(pen=2\), (6) implies that \(pen_t^*\) stays level at 2 once \(t\) reaches 2.) The first few transitions are explained as follows:

Transition from \(t=0\) to \(t=1\):
At time 0, one has \(pen^*_0=0\) from (6), and therefore \(1+pen^*_0=1\) claimfree year is required to reduce the BM level by one at time 1 (like the classical BMS), and so we set \(a=0\) at time 0. As it turns out that there is no claim reported at \(t=1\) (i.e., \(N_1=0\)), the BM level is reduced from \(L_0=10\) to \(L_1=9\) (and thus \(\ell=9\) at \(t=1\)). Now \(pen^*_1=1\) according to (6), meaning that a total of \(1+pen^*_1=2\) consecutive claimfree years are needed to reduce the BM level by one the next year at time 2. Since there has already been one claimfree year, only one more claimfree year is needed to earn such a reward (again like the classical BMS), and therefore one sets \(a=0\) at time 1. 
Transition from \(t=1\) to \(t=2\):
Since \(N_2=0\) again, the condition of reward is satisfied and the BM level moves down to \(L_2=8\) (and \(\ell=8\)) at time 2. With \(pen^*_2=2\) at time 2, it is known that \(1+pen^*_t=3\) consecutive claimfree years are needed to lower the BM level by one. With two consecutive claimfree years already in hand, we need only one more claimfree year to reduce the BM level in the next year, and therefore \(a=0\) at time 2. 
Transition from \(t=2\) to \(t=3\):
Since \(N_3=2\) in the third year, the BM level moves up to 12 (\(=8+(2\times 2)\)) at time 3 (and so \(\ell=12\)). As \(pen^*_3=2\) at time 3, one needs \(1+pen^*_3=3\) consecutive claimfree years to reduce the BM level by one, which means two claimfree years in addition to the one claimfree year required in the classical BMS. Consequently, we have \(a=2\) at \(t=3\). 
Transition from \(t=3\) to \(t=4\):
Because \(N_4=1\) in the fourth year, the BM level increases to \(L_4=14\) (\(=12+(2\times 1)\)) at time 4. As there is a claim in this period, three consecutive claimfree years are still needed to reduce the BM level by one. So, we have \(a=2\) at \(t=4\). 
Transition from \(t=4\) to \(t=5\):
Although there is no claim in the fifth year as \(N_5=0,\) the condition of having three consecutive claimfree years in order to enjoy a reward is not satisfied, and therefore the BM level stays at the same level as last year, so that \(L_5=14\). Then two more consecutive claimfree years are required to lower the BM level by one. This is one additional year compared with the classical \(1/+2\) system, so \(a=1\) at \(t=5\).
Example 2. Suppose that we have the \(1/+2/1\) system with \(z=7\). We consider Model 1 where a policyholder, say the \(i\)th policyholder, has observed and unobserved risk characteristics \(\mathbf{X}_i=\mathbf{x}_i\) and \({\Theta}_i={\theta}_i\), respectively. Then, based on the augmented form of BM levels defined in (8), one finds a total of \((2+((72+1)\times 2))=14\) BM levels with the transition matrix \(\mathbf{P}\) given by
\[\tiny\begin{align} &\begin{array}{lllllllllllllll} & \quad & (0)_0 & (1)_0 & (2)_0 & (2)_1 & (3)_0 & (3)_1& (4)_0 & (4)_1& (5)_0 & (5)_1& (6)_0 & (6)_1& (7)_0 & (7)_1 \end{array} \\ &\begin{array}{l} (0)_0\\ (1)_0\\(2)_0\\(2)_1\\(3)_0\\(3)_1\\(4)_0\\(4)_1\\(5)_0\\(5)_1\\(6)_0\\(6)_1\\(7)_0\\(7)_1 \end{array} \left(\begin{array}{lllllllllllll} p_0\quad& 0\quad\ & 0\quad\ & p_1\ \ \ & 0\quad\ & 0\quad\ & 0\quad\ & p_2\ \ \ & 0\quad\ & 0\quad\ & 0\quad & p_3\quad & 0\quad & 1p_0p_1p_2p_3 \\ p_0& {0} & {0} & {0} & {0} & p_1& 0 & 0 & 0 & p_2& 0 & 0 & 0 & 1p_0p_1p_2\\ 0 & p_0& 0 & 0 & 0 & 0 & 0 & p_1& 0 & 0 & 0 & p_2& 0 & 1p_0p_1p_2\\ 0 & 0 & p_0& 0 & 0 & 0 & 0 & p_1& 0 & 0 & 0 & p_2& 0 & 1p_0p_1p_2\\ 0 & 0 & p_0& 0 & 0 & 0 & 0 & 0 & 0 & p_1& 0 & 0 & 0 & 1p_0p_1\\ 0 & 0 & 0 & 0 & p_0& 0 & 0 & 0 & 0 & p_1& 0 & 0 & 0 & 1p_0p_1\\ 0 & 0 & 0 & 0 & p_0& 0 & 0 & 0 & 0 & 0 & 0 & p_1& 0 & 1p_0p_1\\ 0 & 0 & 0 & 0 & 0 & 0 & p_0& 0 & 0 & 0 & 0 & p_1& 0 & 1p_0p_1\\ 0 & 0 & 0 & 0 & 0 & 0 & p_0& 0 & 0 & 0 & 0 & 0 & 0 & 1p_0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & p_0& 0 & 0 & 0 & 0 & 1p_0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & p_0& 0 & 0 & 0 & 0 & 1p_0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & p_0& 0 & 0 & 1p_0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & p_0& 0 & 0 & 1p_0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & p_0& 1p_0 \end{array}\right) \end{align} \]
where \(p_n\) in the matrix is defined by (see (B.1)) \[p_n:=\mathbb{P}(N_{it}=n\Theta_i=\theta_i, \mathbf{X}_i=\mathbf{x}_i), \qquad n=0,1,\ldots.\] For simplicity, the dependence of \(p_n\) on \(\theta_i\) and \(\mathbf{x}_i\) is suppressed. Similarly, if we assume Model 2 instead and the \(i\)th policyholder has observed and unobserved risk characteristics given by \((\mathbf{X}^{[1]}_i, \mathbf{X}^{[2]}_i)=(\mathbf{x}^{[1]}_i, \mathbf{x}^{[2]}_i)\) and \((\Theta^{[1]}_i, \Theta^{[2]}_i)=(\theta^{[1]}_i, \theta^{[2]}_i)\), then \(p_n\) in the matrix \(\mathbf{P}\) is (see (C.1)) \[p_n:=\mathbb{P}(N_{it}=n\Theta^{[1]}_i=\theta^{[1]}_i, \mathbf{X}^{[1]}_i=\mathbf{x}^{[1]}_i), \qquad n=0,1,\ldots.\]
3.2. Optimal relativities
In this section, under a \(1/+h/pen\) system, the optimal values of the relativities \(\zeta(\ell)\) for \(\ell=0,1,\ldots,z\) are studied. Following the classical BMS literature, we denote the stationary distribution of \(L_t^*\) by \(L^*\) for a randomly picked policyholder. Then, under the representation in Model 1, one finds in an analogous manner to (3) that
\[\begin{align} \mathbb{P}(L^*=&(\ell)_a) =\sum_{k=1}^\mathcal{K} w_{k} \int \pi_{(\ell)_a}^*(\lambda_k\theta, \psi) g(\theta){\rm d}\theta,\\&(\ell)_a\in \mathcal{A}_{z, pen},\end{align}\tag{11}\]
where \(\pi_{(\ell)_a}^*(\lambda_k\theta, \psi)\) is the stationary distribution for a policyholder to be in BM level \((\ell)_a\) given that his or her a priori claim frequency is \(\lambda_k\) and unobserved risk characteristics are summarized in \(\theta\). (Again, \(\lambda_k\), \(g(\cdot)\) and \(\psi\) are replaced by \(\lambda^{[1]}_k\), \(g_1(\cdot)\) and \(\psi^{[1]}\), respectively, if we consider Model 2 instead.) For example, the 14dimensional row vector of stationary probabilities corresponding to Example 2, namely \[\mathbf{\pi}^*(\lambda_{k}\theta,\psi):=(\pi_{(0)_0}^*(\lambda_{k}\theta,\psi), \pi_{(1)_0}^*(\lambda_{k}\theta,\psi),\ldots, \pi_{(7)_1}^*(\lambda_{k}\theta,\psi)),\] can be obtained as the solution of \[\begin{cases} \mathbf{\pi}^*(\lambda_{k}\theta,\psi) = \mathbf{\pi}^*(\lambda_{k}\theta,\psi)\mathbf{P},\\ \mathbf{\pi}^*(\lambda_{k}\theta,\psi)\mathbf{e}_{14}=1,\\ \end{cases}\] where \(\mathbf{P}\) is the transition matrix in Example 2 (calculated with \(\mathbf{x}_{k}\) and \(\theta\) in place of \(\mathbf{x}_{i}\) and \(\theta_i\), respectively), and \(\mathbf{e}_{14}\) is a 14dimensional column vector of ones.
Under the augmented system, we consider the optimal relativities as the solution of the optimization problem \[\small (\tilde{\zeta}(0), \ldots, \tilde{\zeta}(z)):=\mathop{\mathrm{arg\,min}}_{({\zeta}(0), \ldots, {\zeta}(z))\in \mathbb{R}^{{z+1}}}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}^*(L^*))^2]\tag{12}\] resembling (4), where we aim to predict the frequency under the frequencyonly Model 1. On the other hand, when we are interested in the prediction of the aggregate claim, like (5), the optimal relativities are given by the solution of the optimization problem \[\scriptsize (\tilde{\zeta}(0), \ldots, \tilde{\zeta}(z)):=\mathop{\mathrm{arg\,min}}_{({\zeta}(0), \ldots, {\zeta}(z))\in \mathbb{R}^{{z+1}}} \mathbb{E}[(\Lambda^{[1]}\Lambda^{[2]}\Theta^{[1]}\Theta^{[2]}\Lambda^{[1]}\Lambda^{[2]} \zeta^*(L^*))^2]\tag{13}\] under the frequencyseverity Model 2. Recall that, by definition, \(\zeta^*:\mathcal{A}_{z, pen}\mapsto \mathbb{R}\) in (12) or (13) is completely characterized by \(\zeta(\ell)\) for \(\ell=0,\ldots,z\) (see (10)).
Proposition 1. Consider the \(1/+h/pen\) system under Model 1. The optimal relativities defined as the solution of the optimization problem (12) are given by \[ \tilde{\zeta}(\ell):=\frac{\mathbb{E}[\Lambda^2 \Theta  L^*=(\ell)_0]} {\mathbb{E}[\Lambda^2  L^*=(\ell)_0]}, \qquad \ell= 0, \ldots,{h1},\tag{14}\] and
\[\begin{align} \tilde{\zeta}(\ell):=& \frac{\sum_{a=0}^{pen}\mathbb{E}[\Lambda^2 \Theta L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)} {\sum_{a=0}^{pen}\mathbb{E}[\Lambda^2  L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)}, \\ \ell =& h, \ldots,{z}; a=0, \ldots, pen,\end{align}\tag{15}\]
where the numerator and denominator can be calculated based on \[\begin{align} \mathbb{E}&[\Lambda^2 \Theta  L^*=(\ell)_a]\\ &= \frac{\sum_{k=1}^\mathcal{K} w_{k} \lambda_{k}^2 \int \theta \pi_{(\ell)_a}^*(\lambda_{k}\theta, \psi) g(\theta){\rm d}\theta} {\mathbb{P}(L^*=(\ell)_a)}\end{align}\tag{16}\] and
\[\small \mathbb{E}[\Lambda^2  L^*=(\ell)_a] = \frac{\sum_{k=1}^\mathcal{K} w_{k} \lambda_{k}^2 \int \pi_{(\ell)_a}^*(\lambda_{k}\theta, \psi)g(\theta){\rm d}\theta} {\mathbb{P}(L^*=(\ell)_a)},\tag{17}\] respectively. Note that when \(pen=0\), the optimal relativity \(\tilde{\zeta}(\ell)\) reduces to that in Tan et al. (2015) given in Lemma 1.
Proof. Note that the objective function in (12) can be written as
\[\small\begin{align} \mathbb{E}&[(\Lambda\Theta\Lambda {\zeta}^*(L^*))^2]\\&=\sum_{(\ell)_a\in \mathcal{A}_{z, pen}}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}^*((\ell)_a))^2L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)\nonumber\\ &=\sum_{(\ell)_a\in \mathcal{A}_{z, pen}}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}(\ell))^2L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a),\end{align}\tag{18}\]
where the last line follows from (10). We observe from (8) that

for each \(\ell=0, \ldots, h1\), the relativity \(\zeta(\ell)\) is shared by only one BM level \((\ell)_0\); and

for each \(\ell=h, \ldots,z\), the same relativity \(\zeta(\ell)\) is shared by the augmented BM levels \((\ell)_0, \ldots, (\ell)_{pen}\).
Consequently, (18) becomes \[\small\begin{align} \mathbb{E}&[(\Lambda\Theta\Lambda {\zeta}^*(L^*))^2] \\&=~\sum_{\ell=0}^{h1}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}(\ell))^2L^*=(\ell)_0]\mathbb{P}(L^*=(\ell)_0)\nonumber\\ &+\sum_{\ell=h}^z\sum_{a=0}^{pen}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}(\ell))^2L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a).\end{align}\tag{19}\]
For each fixed \(\ell=0, \ldots, h1\), differentiation of (19) with respect to \({\zeta}(\ell)\) for optimization yields
\[\mathbb{E}[2\Lambda(\Lambda\Theta\Lambda\zeta(\ell))L^*=(\ell)_0]\mathbb{P}(L^*=(\ell)_0)=0,\]
from which (14) follows. On the other hand, when \(\ell=h, \ldots,{z}\), taking the derivative with respect to \({\zeta}(\ell)\) in (19) leads to
\[\sum_{a=0}^{pen}\mathbb{E}[2\Lambda(\Lambda\Theta\Lambda\zeta(\ell))L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)=0,\]
proving (15).
Here, the numerators of (14) and (15) can be calculated using
\[\tiny\begin{aligned} \mathbb{E}&[\Lambda^2\Theta L^*=(\ell)_a] \\&=~\frac{1}{\mathbb{P}(L^*=(\ell)_a)} \mathbb{E}[\Lambda^2\Theta I(L^*=(\ell)_a)] \nonumber\\ &=~\frac{1}{\mathbb{P}(L^*=(\ell)_a)} \sum_{k=1}^\mathcal{K}\int \lambda_k^2\theta\mathbb{P}(\Theta\in {\rm d}\theta, \Lambda=\lambda_k,L^*=(\ell)_a)\nonumber\\ &=~\frac{1}{\mathbb{P}(L^*=(\ell)_a)} \sum_{k=1}^\mathcal{K}\int \lambda_k^2\theta\mathbb{P}(L^*=(\ell)_a\Theta\in {\rm d}\theta, \Lambda=\lambda_k) \mathbb{P}(\Theta\in {\rm d}\theta, \Lambda=\lambda_k).\end{aligned}\tag{20}\]
Since the a priori \(\mathbf{X}\) and the a posteriori \(\Theta\) are independent and \(\Lambda=\eta^{1} (\mathbf{X}\mathbf{\beta})\), one has that \(\mathbb{P}(\Theta\in {\rm d}\theta, \Lambda=\lambda_k)=\mathbb{P}(\Lambda=\lambda_k)\mathbb{P}(\Theta\in {\rm d}\theta)=w_kg(\theta){\rm d}\theta\). Using this together with the fact that \(\mathbb{P}(L^*=(\ell)_a\Theta\in {\rm d}\theta, \Lambda=\lambda_k)=\pi_{(\ell)_a}^*(\lambda_{k}\theta, \psi)\) confirms that (20) reduces to (16). The expectation (17) can be obtained in an almost identical manner, and the details are omitted.
Proposition 2. Consider the \(1/+h/pen\) system under Model 2. The optimal relativities defined as the solution of the optimization problem (13) are given by
\[\begin{align}\tilde{\zeta}(\ell)&:=\frac{\mathbb{E}[(\Lambda^{[1]}\Lambda^{[2]})^2 \Theta^{[1]}\Theta^{[2]}L^*=(\ell)_0]} {\mathbb{E}[(\Lambda^{[1]} \Lambda^{[2]})^2 L^*=(\ell)_0]},\\ \ell&=0, \ldots,{h1},\end{align}\]
and
\[\begin{align}\tilde{\zeta}(\ell)&:=\frac{\sum_{a=0}^{pen}\mathbb{E}[( \Lambda^{[1]}\Lambda^{[2]})^2 \Theta^{[1]}\Theta^{[2]} L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)} {\sum_{a=0}^{pen}\mathbb{E}[(\Lambda^{[1]} \Lambda^{[2]})^2 L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)}, \\ \ell&=h, \ldots,{z}; a=0, \ldots, pen,\end{align}\]
where the numerator and denominator can be calculated based on
\[\scriptsize\begin{align}\mathbb{E}&[( \Lambda^{[1]}\Lambda^{[2]})^2 \Theta^{[1]}\Theta^{[2]} L^*=(\ell)_a]\\&=\frac{\sum_{k=1}^\mathcal{K} w_{k} (\lambda_{k}^{[1]}\lambda_{k}^{[2]})^2 \int\int \theta^{[1]}\theta^{[2]} \pi_{(\ell)_a}^*(\lambda_{k}^{[1]}\theta^{[1]},\psi^{[1]}) h(\theta^{[1]}, \theta^{[2]})\rm d\theta^{[1]}\rm d \theta^{[2]}} {\mathbb{P}(L^*=(\ell)_a)}\end{align}\]
and \[\scriptsize\mathbb{E}[( \Lambda^{[1]}\Lambda^{[2]})^2 L^*=(\ell)_a] = \frac{\sum_{k=1}^\mathcal{K} w_{k} (\lambda_{k}^{[1]}\lambda_{k}^{[2]})^2 \int \pi_{(\ell)_a}^*(\lambda_{k}^{[1]}\theta^{[1]},\psi^{[1]})g_1(\theta^{[1]}){\rm d}\theta^{[1]}} {\mathbb{P}(L^*=(\ell)_a)},\] respectively. Note that when \(pen=0\), the optimal relativity \(\tilde{\zeta}(\ell)\) reduces to that in Oh, Shi, and Ahn (2020) given in Lemma 2).
Proof. The proof of Proposition 2 is similar to that of Proposition 1, and thus we omit the details.
3.3. Numerical analysis
Here we provide a simple example to examine the effect of the period of penalty on the optimal BM relativity \(\tilde{\zeta}(\ell)\) and the stationary probability \(\mathbb{P}(L=\ell)\). First, it is convenient to introduce the relation
\[ \mathbb{P}(L=\ell)=\sum_{\{a(\ell)_a\in \mathcal{A}_{z, pen}\}} \mathbb{P}(L^*=(\ell)_a),\tag{21}\]
which follows from the definition of \(L^*\). The performances of our modified BM transition rules under a \(1/+1/pen\) system as well as a \(1/+2/pen\) system for \(pen=0,1,2,3\) are compared by computing the hypothetical mean squared error (HMSE). For Model 1 (the frequency random effects model) with the set of BM relativities \(\mathbf{\zeta}=\{\zeta(\ell)\}_{\ell=0}^z\), the HMSE is expressed as
\[\scriptsize\begin{align} {\rm HMSE}(\mathbf{\zeta}) &:=\mathbb{E}[(\Lambda\Theta\Lambda \zeta^*(L^*))^2]\nonumber\\ &=\sum_{k=1}^\mathcal{K}\sum_{(\ell)_a\in \mathcal{A}_{z, pen}} w_k \int (\lambda_k \theta\lambda_k \zeta(\ell))^2 \pi_{(\ell)_a}^*(\lambda_k\theta,\psi) g(\theta) {\rm d}\theta.\end{align}\tag{22}\]
If \(\mathbf{\zeta}\) is replaced by the vector of optimal relativities \(\tilde{\mathbf{\zeta}}=\{\tilde{\zeta}(\ell)\}_{\ell=0}^z\) calculated using Proposition 1, then \({\rm HMSE}(\tilde{\mathbf{\zeta}})\) is the minimized value corresponding to the righthand side of the optimization problem (12).
Similarly, for Model 2 with the set of BM relativities \(\mathbf{\zeta}=\{\zeta(\ell)\}_{\ell=0}^z\), following the logic in Oh, Shi, and Ahn (2020), the HMSE is expressed as
\[\tiny\begin{align}{\rm HMSE}(\mathbf{\zeta}) &:=\mathbb{E}[(\Lambda^{[1]}\Lambda^{[2]}\Theta^{[1]}\Theta^{[2]}\Lambda^{[1]}\Lambda^{[2]} \zeta^*(L^*))^2]\\ &=\sum_{k=1}^\mathcal{K}\sum_{(\ell)_a\in \mathcal{A}_{z, pen}} w_k \int \int \big(\lambda_k^{[1]}\lambda_k^{[2]} \theta^{[1]}\theta^{[2]} \lambda_k^{[1]}\lambda_k^{[2]} \zeta(\ell) \big)^2 \pi_{(\ell)_a}^*(\lambda_k\theta,\psi^{[1]}) h(\theta^{[1]}, \theta^{[2]}) {\rm d}\theta^{[1]} \rm d \theta^{[2]}.\end{align}\]
Again, \({\rm HMSE}(\tilde{\mathbf{\zeta}})\) is the minimum on the righthand side of the optimization problem (13) with \(\tilde{\mathbf{\zeta}}=\{\tilde{\zeta}(\ell)\}_{\ell=0}^z\) calculated using Proposition 2.
Example 3. Under the frequency model with random effects described as Model 1, it is assumed that there is only one risk class, where the frequency \(N\) follows a Poisson distribution with a priori frequency \(\lambda=\lambda_0\) (so that the distribution function \(F\) can be put in the form of (B.1)) and the random effect \(\Theta\) in (B.2) has a lognormal distribution with mean 1 and \(\sigma^2=0.99\), that is,
\[\small\begin{aligned} N(\Lambda=\lambda,\Theta=\theta) &\sim {\rm Poisson} (\lambda\theta) \qquad {\rm with} \quad \lambda=\lambda_0,\\ \Theta &\sim {\rm Lognormal}(\sigma^2/2, \sigma^2) \qquad {\rm with} \quad \sigma^2=0.99.\end{aligned}\]
With \(z=9\) and \(pen=0,1,2,3\), the stationary probabilities \(\mathbb{P}(L=\ell)\) and the optimal BM relativities \(\tilde{\zeta}(\ell)\) for all \(\ell\)’s and the resulting HMSE (under the optimal relativities) are calculated under a \(1/+1/pen\) system and a \(1/+2/pen\) system. The results when \(\lambda_0=0.05\) are first summarized in Tables 2(a) and (b).
In obtaining the tables, we first construct the transition probability matrix according to the transition rules of the BMS and the claim frequency distribution with the assumed parameters (see Example 2) and obtain the stationary probability \(\pi_{(\ell)_a}^*\) as explained at the beginning of Section 3.2. Then, \(\mathbb{P}(L=\ell)\) can be calculated from (11) and (21). Note that integration with respect to \(\theta\) under the distributional assumption on the density \(g(\theta)\) of \(\Theta\) is needed in (11). While explicit evaluation of such a onedimensional integral is not possible in general, numerical integration can be implemented easily in most computing software. In turn, the optimal relativities \(\tilde{\mathbf{\zeta}}=\{\tilde{\zeta}(\ell)\}_{\ell=0}^z\) can be obtained from Proposition 1, where numerical integration is again performed based on (16) and (17) (but the integral appearing in (17) is the same as that in (11)). HMSE can be similarly calculated via (22) once one has calculated the optimal relativities \(\tilde{\mathbf{\zeta}}\).
We first look at the \(1/+1/pen\) system in Table 2(a). For each fixed value of \(pen=0, 1, 2, 3\), most of the population is in the lowest BM level 0 (as \(\mathbb{P}(L=0)\) is at least \(82\%\)), which can be explained by the relatively low mean claim frequency of \(\lambda_0=0.05\). By inspecting \(\mathbb{P}(L=\ell)\) for \(pen=0, 1, 2, 3\), we observe that an increase in \(pen\) tends to move some of the population toward higher BM levels, leading to a diversification of the stationary BM levels. This is because a higher \(pen\) means that more claimfree years are needed to reduce the BM level by one, rendering it more difficult for drivers at a high BM level to move downward. As mentioned in Proposition 1, the optimal relativities under \(pen=0\) are the same as those in Tan et al. (2015). As \(pen\) increases, it is important to observe that the optimal relativity \(\tilde{\zeta}(\ell)\) for each fixed BM level \(\ell\) decreases. Although a lower BM relativity means that a driver occupying a given BM level pays a less expensive premium in a \(1/+1/pen\) system with a higher \(pen\), the insurer’s premium income is in turn compensated by an increased portion of drivers occupying higher BM levels as \(pen\) increases. A higher \(pen\) also results in an improvement of the HMSE in this example. Similar patterns can be found in the \(1/+2/pen\) system as shown in Table 2(b). However, for a fixed \(pen = 1, 2, 3\), the optimal relativity \(\tilde{\zeta}(\ell)\) in Table 2(b) is not necessarily increasing in \(\ell\) especially for low BM levels. For example, it is observed that \(\tilde{\zeta}(2)\) is slightly smaller than \(\tilde{\zeta}(1)\). With \(h=2\), a possible explanation for \(\tilde{\zeta}(2)<\tilde{\zeta}(1)\) is that certain policyholders occupying BM level 2 could be those who were in the lowest BM level 0 (and had one claim) in the previous year, and these are still good drivers. On the other hand, those in BM level 1 could not have come directly from BM level 0. It is interesting to note that although \(\tilde{\zeta}(\ell)\) is not always increasing in \(\ell\) in Table 2(b), reporting \(n\ge 1\) claims moves a policyholder upward by \(2n\) BM levels and this always leads to an increase in \(\tilde{\zeta}(\ell)\). Nevertheless, potential problems arise when a policyholder moves down from BM level 2 to BM level 1 after a number of claimfree years and then suffers an increase in the optimal relativity (and hence premium). For future work, constrained optimization may be considered by requiring \(\tilde{\zeta}(\ell)\) to be increasing in \(\ell\), but it is unlikely that this will lead to optimal relativities that admit expressions as explicit as those in Propositions 1 and 2.
When \(\lambda_0=1\), we consider BM levels up to \(z=14\) because the assumption of a higher mean claim frequency will result in a larger proportion of policyholders occupying higher BM levels. The \(1/+1/pen\) system in Table 2(c) shows some similar features: as \(pen\) increases, \(\tilde{\zeta}(\ell)\) decreases and the distribution (at equilibrium) of policyholders shifts from lower BM levels to higher ones. However, since a significant fraction of the population is already in the highest BM level 14 when \(pen=0\), increasing \(pen\) antidiversifies the stationary distribution of BM levels. We observe that HMSE is increased with higher \(pen\).
We also notice one interesting phenomenon when comparing the cases where the claim frequencies are of different magnitude. In Tables 2(a) and (b) for which \(\lambda_0=0.05\), the optimal relativity \(\tilde{\zeta}(0)\) for BM level 0 is still at least 69% while that for the highest BM level can be as high as 17. An intuitive explanation is that given the low claim frequency \(\lambda_0=0.05\), it is not unusual that drivers do not file claims. As a result, claimfree drivers (those occupying the lowest BM level) are not necessarily much better drivers than others, and therefore they are not rewarded with much premium discount. On the other hand, drivers occupying the highest BM level must have reported some claims in the past years, and they are more likely to be worse drivers with more risky unobserved risk characteristics (that is, higher \(\Theta\)) and thus are more severely penalized. Moving to Table 2(c) for which \(\lambda_0=1\), it becomes much more common for drivers to report claims and occupy higher BM levels, so they are not penalized too much with the highest BM relativity \(\tilde{\zeta}(14)\) around 2. But it is those at lower BM levels who have claimfree years that should be regarded as significantly better drivers, and they are given a great discount with the BM relativity \(\tilde{\zeta}(0)\) no larger than 24.6%.
From this example, we conclude that while the extension of the classical \(1/+h\) system to the \(1/+h/pen\) system provides room for improvement of the predictive power via lowering the HMSE, such an improvement is not always guaranteed.
Example 4. Under the collective risk model with random effects described as Model 2, we assume one risk class only, where the distribution function \(F_1\) in (C.1) for the frequency part follows a Poisson distribution with \(\lambda^{[1]}=\lambda_0^{[1]}\) and the distribution function \(F_2\) in (C.1) for the severity part follows a gamma distribution with \(\lambda^{[2]}=\lambda_0^{[2]}\) and \(\psi^{[2]}=\psi_0^{[2]}\). The marginal distributions of the random effects \((\Theta^{[1]}, \Theta^{[2]})\) are specified as
\[\begin{cases} \Theta^{\lbrack1\rbrack} ~\sim &{\rm Lognormal}(\sigma_{1}^{2}/2, \sigma_{1}^{2}) \qquad {\rm with} \quad \sigma_{1}^{2}=0.99,\\ \Theta^{\lbrack2\rbrack} ~\sim &{\rm Lognormal}(\sigma_{2}^{2}/2, \sigma_{2}^{2}) \qquad {\rm with} \quad \sigma_{2}^{2}=0.29, \end{cases}\]
with distribution functions \(G_1\) and \(G_2\), respectively, and the dependence is described by a Gaussian copula \(C\) with correlation coefficient \(\rho=0.45\) (so that the joint distribution function is given by \(H=C(G_1, G_2)\) in (C.4)). With \(pen=0,1,2,3\), we have calculated the values of the stationary probabilities \(\mathbb{P}(L=\ell)\) and the optimal BM relativities \(\tilde{\zeta}(\ell)\) for all \(\ell\)’s together with the HMSE under a \(1/+1/pen\) system for the cases \(z=9\) and \(\lambda_0^{[1]}=0.05\) (Table 3(a)) as well as \(z=14\) and \(\lambda_0^{[1]}=1\) (Table 3(c)) and under a \(1/+2/pen\) system for the case \(z=9\) and \(\lambda_0^{[1]}=0.05\) (Table 3(b)). In constructing Table 3, the parameters \(\lambda_0^{[2]}=\exp(8.00)\) and \(\psi_0^{[2]}=1/0.670\) are kept fixed. Note that the values of HMSE in Table 3 are much higher than those in Table 2 concerning Example 3, and that is simply because the HMSE in the present example measures the error concerning the aggregate claim (and the individual severity has large magnitude as \(\lambda_0^{[2]}=\exp(8.00)\)) whereas the HMSE in Example 3 concerns the error in the claim number. Moreover, some of the optimal relativities in Table 3 are quite different from those in Table 2, showing the importance of taking into account the dependence between claim frequency and claim severity for modeling purposes. Nonetheless, similar patterns to those in Table 2 are observed in Table 3. Note also that the stationary probabilities in Table 3 are identical to their counterparts in Table 2 because the probabilities depend only on the distribution of the claim frequency.
4. Data analysis
4.1. Summary of data estimation results
In this subsection, we summarize the data estimation results using real data. To examine the effect of dependence on ratemaking, we employ a data set concerning collision coverage for new and old vehicles from the Wisconsin Local Government Property Insurance Fund (LGPIF) (Frees, Lee, and Yang 2016)—detailed information on the project can be found at the LGPIF project website. Such collision coverage provides coverage for impact of vehicle with an object, impact of vehicle with an attached vehicle, and overturn of a vehicle. The observations include policyholders who have either new collision coverage or old collision coverage, or both. In our data analysis, longitudinal data over the policy years from 2006 to 2010 with 497 governmental entities are used. There are two categorical variables: the entity type with six levels, and the coverage with three levels as shown in Table 4 (which is the same as Table C.6 in Oh, Shi, and Ahn (2020)).
Specifically, for Model 1 in the data analysis, as in (B.1) we consider \[\small N_{it}(\Theta_i=\theta_i,\mathbf{X}_{i}=\mathbf{x}_{i})\sim {\rm Poisson}(\lambda_i\theta_i)\quad\hbox{with}\quad \lambda_i:=\exp(\mathbf{x}_i\mathbf{\beta}),\] where \(\mathbf{x}_i\) is a row vector containing the \(i\)th policyholder’s information of the categorical variables in Table 4 for the frequency, and the column vector \(\mathbf{\beta}\) contains the corresponding parameters. For the random effect \(\Theta\), we assume a lognormal distribution with mean 1 and variance of \(\exp(\sigma^2)1\), that is, \[\Theta \sim {\rm Lognormal}(\sigma^2/2, \sigma^2).\] Summary statistics of the posterior samples for the parameters in Model 1 using the Bayesian approach are presented in Table 5. The table includes the posterior median (Est), the posterior standard deviation (Std.dev), and the 95% highest posterior density Bayesian credible interval (95\(\%\) CI). Note that an asterisk, \(*\), indicates that the parameters are significant at the 0.05 level. In estimating the parameters in Table 5, we have run 30,000 Markov chain Monte Carlo (MCMC) iterations saving every fifth sample after a burnin of 15,000 iterations by using the JAGS program. Standard MCMC diagnostics gave no indication of lack of convergence. For Model 2, in line with (C.1) and (C.2), we assume \[N_{it}(\Theta_i^{[1]}=\theta_i^{[1]},\mathbf{X}_{i}^{[1]}=\mathbf{x}_{i}^{[1]})\sim {\rm Poisson}(\lambda_i^{[1]}\theta_i^{[1]})\] for the frequency, and \[Y_{itj}(\Theta_i^{[2]}=\theta_i^{[2]}, \mathbf{X}_{i}^{[2]}=\mathbf{x}_{i}^{[2]}) \sim {\rm Gamma}(\lambda_i^{[2]}\theta_i^{[2]},1/\psi^{[2]})\] for the individual claim severity, where \(\lambda_i^{[2]}\theta_i^{[2]}\) is the mean and \(1/\psi^{[2]}\) is the shape parameter of the gamma distribution. With log link, we assume \[\lambda_i^{[1]}=\exp(\mathbf{x}_i^{[1]}\mathbf{\beta}^{[1]}) \quad \hbox{and} \quad \lambda_i^{[2]}=\exp(\mathbf{x}_i^{[2]}\mathbf{\beta}^{[2]}),\] where we take \(\mathbf{x}_i^{[1]}=\mathbf{x}_i^{[2]}\), and \(\mathbf{\beta}^{[1]}\) and \(\mathbf{\beta}^{[2]}\) are the corresponding parameters.
For the bivariate random effects \((\Theta^{[1]},\Theta^{[2]})\), we assume a Gaussian copula \(C\) in (C.4) with correlation coefficient \(\rho,\) and the marginal distributions are assumed to follow lognormal distributions with different parameters specified as
\[\begin{cases} \Theta^{\lbrack1\rbrack} ~\sim& {\rm Lognormal}(\sigma_{1}^{2}/2, \sigma_{1}^2),\\ \Theta^{\lbrack2\rbrack} ~\sim& {\rm Lognormal}(\sigma_{2}^{2}/2, \sigma_{2}^2). \end{cases}\]
The estimation results from Oh, Shi, and Ahn (2020) are summarized in Table 7, and we refer interested readers to Oh, Shi, and Ahn (2020) for the details of the estimation procedure and further results. In particular, the dependence parameter of the Gaussian copula is estimated to be \(\rho = 0.447,\) suggesting a significant negative dependence between the random effects \(\Theta^{[1]}\) and \(\Theta^{[2]}\) of the frequency and the severity, respectively.
4.2. Analysis of optimal relativities in a modified BMS
Using the posterior median as estimates of the parameters in Models 1 and 2 when \(z=14\), we calculate the optimal relativity and stationary probability for each BM level as well as the values of HMSE under the modified BM transition rules for a \(1/+1/pen\) system and a \(1/+2/pen\) system when \(pen=0,1,2,3\). The results are summarized in Tables 6 and 8. We remark that unlike in Examples 3 and 4, where the stationary probabilities \(\mathbb{P}(L=\ell)\) are identical across Tables 2 and 3, the stationary probabilities in Tables 6 and 8 are close but not identical. The reason is that the frequency parameters for Model 2 in Table 7 are estimated jointly with the severity parameters and therefore the results are slightly different from those for Model 1 given in Table 5, where only frequency parameters are estimated. Note also that the parameters are estimated using MCMC sampling, which has also possibly contributed to some differences.