1. Introduction
As Lemaire (1998) notes, with respect to automobile insurance, insurers in the United States and Canada tend to employ many a priori variables (observable factors) to classify risks associated with policyholders, whereas in other parts of the world, such as Europe, Asia, Latin America, and Africa, insurers commonly employ a posteriori ratemaking. The common a posteriori ratemaking process can be accomplished via a bonusmalus system (BMS), and such a posteriori ratemaking holds the promise of resolving residual heterogeneity still remaining within risk classes. As noted in Section 1 of Lemaire (1998), although BMSs are not commonly used in North America, that situation could change, as regulatory authorities may prohibit insurers from using certain classification variables and, moreover, a posteriori ratemaking is known to be an efficient way to assess individual risks. Based on the policyholder’s claims experience, such a system discounts the premium as a reward for a claimfree case (bonus) and charges an additional premium in the presence of accidents (malus). Such an a posteriori ratemaking mechanism takes on various forms across countries, but it generally consists of three major components: bonusmalus (BM) levels; transition rules; and BM relativities. Each new policyholder is classified into one of the finite number of risk classes depending on his or her observable risk characteristics, and he or she is also assigned an initial BM level. After each policy period, the BM level moves up or down according to the prespecified transition rules that penalize policyholders based on the number of reported claims and reward those without claims. In a typical BMS, claims experience in the previous policy period is used to determine the transition rule. However, for better separation of consistently and temporarily good (or bad) risks, we can consider capturing claims experience in multiple previous policy periods in the design of the transition rule, as is done in Belgium. Then, the premium is renewed at a rate that is the product of the BM relativity and the base premium. Here the BM relativity is determined by the BM level, which plays a role in the premium adjustment coefficient. In this way, the actual premium charged to policyholders in each BM level can be more closely matched to their risks reflected in the claim number record for the past few policy periods and consequently the claims costs are fairly shared. In the classical BMS, the base premium is often calculated as a product of the mean frequency and the mean severity due to the assumed independence between those two random variables (e.g., Pitrebois, Denuit, and Walhin 2003a; Denuit et al. 2007; Tan et al. 2015). However, the independence assumption is often criticized as some recent insurance reports demonstrate the existence of a significant dependence between claim frequency and severity. In the actuarial literature, various statistical models have been developed in an attempt to capture certain dependency structures between frequency and severity of claims. These include, for example, copula models (Czado et al. 2012; Frees, Lee, and Yang 2016), shared or bivariate random effects models (HernandezBastida, FernandezSanchez, and GomezDeniz 2009; Baumgartner, Gruber, and Czado 2015; Oh, Shi, and Ahn 2020), and twopart models (Shi, Feng, and Ivantsova 2015; Garrido, Genest, and Schulz 2016; Park, Kim, and Ahn 2018; Jeong et al. 2021).
The focus of the paper is twofold. First, we revisit the extension of the BM levels (Lemaire 1995; Pitrebois, Denuit, and Walhin 2003b) and define our longmemory transition rules in a more formal setting. Second, under the proposed transition rules, we consider the optimal relativities not only in a frequencyonly model but also in a dependent collective risk model that allows for dependence between frequency and severity. As an example of the dependent collective model, we consider the copulabased bivariate random effects model as in Oh, Shi, and Ahn (2020), among many choices of shared or bivariate random effects models. Whereas in the classical BMS the policyholder’s BM level is reduced if there is no claim in the past year, in our modified BMS the BM level moves to a lower level when the policyholder has no claim consecutively for years equivalent to the “period of penalty”. A similar case is discussed in Chapter 17 of Lemaire (1985) and Section 5 of Pitrebois, Denuit, and Walhin (2003b), where the transition is made based on the number of consecutive claimfree years. However, the optimal relativities in the present paper are obtained by solving different forms of optimization problems. Under two models—namely, the frequency random effects model and the bivariate random effects model—the optimal relativities and the hypothetical mean squared error are studied. The rest of the paper is organized as follows. In Section 2, we describe two random effects models and give the optimal relativity results under the classical BMS rules. In Section 3, a period of penalty (denoted as “ ”) is introduced in the transition rules such that the BM level moves downward only when there is no claim for the last consecutive years. Under the modified transition rules, an “augmented” BM level is newly defined so that its transition process has the Markov property. In turn, the optimal relativities are provided under the two different models. Finally, in Section 4, we turn to a data set from the Wisconsin Local Government Property Insurance Fund (e.g., Frees, Lee, and Yang (2016)) to illustrate the impact of the longmemory transition rules on our BMS versus the classical BMS.
2. Model descriptions and basic results
For the
th policyholder in the th policy year, we let be the number of claims and be the vector of associated claim amounts, where is the th claim amount. (Note that is undefined when Also, the aggregate claim amount and average claim amount are defined asSit:={∑Nitj=1Yitj,Nit>00,Nit=0andMit:={1Nit∑Nitj=1Yitj,Nit>0,0,Nit=0,
respectively. Clearly, these two quantities are directly linked as
In the insurance ratemarking process, a priori ratemaking results in risk classification of the (new) policyholder whose past claims history is insufficient for the ratemaking purposes. Using the observed risk characteristics of the policyholders (collected in the row vector Oh, Shi, and Ahn (2020) reveals that the dependence between frequency and severity plays a critical role in the determination of optimal relativities in the case of significant dependence. In this regard, the superscripts [1] and [2], representing the frequency and severity components, respectively, are introduced to variables and their corresponding realizations. When we are looking only at the frequency component, we drop the notation [1] for convenience. To model the frequency and the severity of claims, we employ a generalized linear model as described in de Jong and Heller (2008) along with the assumption of an exponential dispersion family for the random components (see Appendix A for a description).
for the th policyholder), their risk classes are determined and the a priori premiums are also obtained. Then, the residual heterogeneity is explained by capturing the unobserved risk characteristics of policyholders (denoted as for the th policyholder) in an a posteriori ratemaking process, and consequently their premium amounts are adjusted based on the claims history. Here, and do not depend on time. In general, the frequency part may be considered enough to construct the BMS under the assumption of independence between frequency and severity. However, recent BMS research in2.1. Random effects models
For the unobserved risk characteristics in a posteriori ratemating, let us describe two different models that take into account, first, the randomness of frequency only and, second, both frequency and severity. For convenience, we assume that there are
risk classes predetermined at the moment of when the contracts begin. We start with the description of the model for the frequency component only.Model 1 (frequency model with random effects). First, the weight of the wk:=P(X=xk),k=1,…,K, where is the row vector of the observed risk characteristics of a randomly picked person from the population of policyholders and is a vector of the observed risk characteristics for the th risk class. Note that can be regarded as the fraction of policyholders with observed risk characteristics The a priori premium for the th policyholder is determined as where is a link function, is the row vector of observed risk characteristics of this th policyholder, and is a column vector of regression coefficients in the count regression model estimated from the past data concerning the number of reported claims. Incorporating the unobserved risk component for the frequency,^{[1]} the conditional distribution of the number of claims for the th policyholder in the th policy year given the observed risk characteristics and unobserved risk characteristics along with the distribution of is provided in Appendix B.
th risk class is defined asThe following model extends the twopart model (Frees and Valdez 1998; Garrido, Genest, and Schulz 2016) by introducing bivariate random effects to model various types of dependence, such as

dependence among frequencies;

dependence among severities; and

dependence between frequency and severity.
Model 2 (collective risk model with bivariate random effects (Oh, Shi, and Ahn 2020)). Recall that the superscripts [1] and [2] represent, respectively, the frequency and the severity components. As an extension of Model 1, this model takes into account the random component of the claim severity as well for the a priori and a posteriori ratemaking processes. In contrast to (2), we define the weight of the th risk class as
wk:=P(X[1]=x[1]k,X[2]=x[2]k),k=1,…,K,
where Λ[1]i:=η−11(X[1]iβ[1])andΛ[2]i:=η−12(X[2]iβ[2]), where and are link functions, and are the th policyholder’s row vectors of observed risk characteristics for frequency and severity, and and are column vectors of parameters to be estimated. More importantly, the dependence between claim frequency and severity is also considered as the corresponding unobserved risk characteristics are modeled via a bivariate copula. More specific distributional descriptions of this model are provided in Appendix C.
is the pair of row vectors containing the observed risk characteristics relevant to the frequency and the severity of claims for a randomly picked person from the population of policyholders, and is the corresponding pair for the th risk class. The a priori premium for the th policyholder can be obtained from the information of the vector which is in turn determined by the observed risks characteristics asIt is instructive to note that when Model 2 is restricted to the claim frequency only, it is conceptually equivalent to Model 1.
2.2. Optimal relativities
In the classical BMS framework, common transition rules are such that the BM level is lowered by one for a claimfree year and increased by
levels per claim, leading to what is referred to as the system. Based on such transition rules, the BM level for each policyholder evolves as a (discretetime) Markov chain. Let us denote as a random variable representing the BM level (from 0 to for a randomly picked policyholder from the population in a stationary state (independent of time). Its distribution (that is, the proportion of policyholders in each level) is determined by the frequency component asP(L=ℓ)=K∑k=1wk∫πℓ(λkθ,ψ)g(θ)dθ,ℓ=0,…,z,
where
is the stationary probability that a policyholder with expected frequency is in level (It is understood that and in (3) are replaced by and respectively, if one considers Model 2.) We shall provide explanations on how to obtain the stationary probabilities at the beginning of Section 3.2 for our more general model in relation to Example 2. The relativity associated with the BM level is denoted by The following lemmas review how to determine the optimal relativity in two optimization settings under the model assumptions in Section 2.1.Lemma 1. (Tan et al. 2015) Consider the optimization problem
(˜ζ(0),…,˜ζ(z)):=argmin(ζ(0),…,ζ(z))∈Rz+1E[(ΛΘ−Λζ(L))2] under the frequencyonly Model 1. Because (see (B.1)) is the “correct” premium for the th policyholder if we did know one can regard as the “correct” premium for a policyholder randomly picked from the population having observed risk characteristics and unobserved risk characteristics The optimization (4) amounts to choosing the relativities to minimize the mean squared difference between and the actual premium charged when a policyholder is in BM level Then, the optimal relativities can be analytically calculated as
˜ζ(ℓ):=E[Λ2ΘL=ℓ]E[Λ2L=ℓ]=∑Kk=1wkλ2k∫θπℓ(λkθ,ψ)g(θ)dθ∑Kk=1wkλ2k∫πℓ(λkθ,ψ)g(θ)dθ,ℓ=0,…,z.
Lemma 2. (Oh, Shi, and Ahn 2020) Consider the optimization problem
(˜ζ(0),…,˜ζ(z)):=argmin(ζ(0),…,ζ(z))∈Rz+1E[(Λ[1]Λ[2]Θ[1]Θ[2]−Λ[1]Λ[2]ζ(L))2]
under the frequencyseverity Model 2. As
(see Appendix C) is the “correct” premium for the th policyholder, is the “correct” premium for a policyholder randomly picked from the population having observed risk characteristics and unobserved risk characteristics The optimization (5) is concerned with choosing the relativities to minimize the mean squared difference between and the actual premium charged when a policyholder is in BM level Then, the optimal relativities can be analytically calculated as˜ζ(ℓ):= E[(Λ[1]Λ[2])2Θ[1]Θ[2]L=ℓ]E[(Λ[1]Λ[2])2L=ℓ]= ∑Kk=1wk(λ[1]kλ[2]k)2∫∫θ[1]θ[2]πℓ(λ[1]kθ[1],ψ[1])h(θ[1],θ[2])dθ[1]dθ[2]∑Kk=1wk(λ[1]kλ[2]k)2∫πℓ(λ[1]kθ[1],ψ[1])g1(θ[1])dθ[1],ℓ=0,…,z.
The optimal relativities in Lemmas 1 and 2 can be applied to various transition rules and BM levels (Pitrebois, Denuit, and Walhin 2003a; Denuit et al. 2007; Tan et al. 2015; Oh, Shi, and Ahn 2020). The most common BMS (and also one of the simplest of the BMSs) is the system, although other variations are available, such as, for example, in Tan et al. (2015). However, insurers in many countries, including Belgium, Korea, and Singapore, adapt transition rules with a long memory. While such transition rules do not directly possess the Markov property, examples in the literature show how that can be dealt with (Lemaire 1995; Pitrebois, Denuit, and Walhin 2003a). A key technique involves augmentation of BM levels, and the following section generalizes these examples.
To calculate optimal relativity in the BMS literature, the Markov property of the transition rules is typically used, and the BM level is often assumed to be stationaryWe remark that the optimization in (4) or (5) is not the only optimization criterion one can use. Various criteria exist to define the optimality of BM relativities—see, e.g., Denuit et al. (2007) and Tan et al. (2015). More importantly, BM relativities obtained from (4), (5), Denuit et al. (2007), and Tan et al. (2015) are known to create some systematic bias in the prediction of premium. Such systematic bias is called the doublecounting problem, and we refer interested readers to Lemaire (1995), Taylor (1997), and Oh et al. (2020) for the details of the doublecounting problem and its solution.
3. A modified BMS with augmented BM levels
In this section, the number of consecutive claimfree years for a policyholder is taken into account in the transition rules. Assuming there are
BM levels, we let be the BM level at time so that is the BM level applicable for the th policy year, that is, from time to time Similar to previous notation, for we use to denote the number of claims in the th policy year.3.1. Modified transition rules of
While the classical BMS usually adopts the socalled pen∗t:=min for This implies that the BM level at time starts with no penalty, and indeed a new policyholder can have his or her BM level reduced by one every year as long as he or she maintains a noclaim record. (This is also equivalent to letting and saying that one requires consecutive claimfree years to decrease the BM level by one.) Note that the classical BMS is retrieved from our extended model by letting We further assume that a new policyholder without any driving history belongs to the BM level in the beginning. Then, is mathematically represented as
system for as described in Section 2.2, we shall introduce an additional component, namely “ ” to this system. Specifically, under such a system, each reported claim increases the BM level by while consecutive claimfree years will result in a decrease of one BM level at time when moving from the th policy year to the th. Here is defined as\scriptsize L_t:= \begin{cases} \min\{L_{t1}+h N_{t}, z\}, & N_{t}>0,\\ \max\{L_{t1}1, 0\}, & N_{t}= \ldots =N_{\max\{tpen^*_{t1}, 1\}} =0,\\ L_{t1}, &\hbox{otherwise}, \end{cases}\tag{7}
for
with It is also assumed that the BM relativity is applied in the same manner as in the classical system. That is, each BM level is bestowed with BM relativity Compared with the classical system, an additional claimfree years are required to reduce the BM level by one, whereas the penalty of climbing BM levels per claim remains the same. Such a system is introduced to motivate policyholders to drive carefully by applying a stricter rule for lowering the BM level (reward case) while keeping the same rule with respect to increasing the BM level (penalty case). Insurers have adopted such transition rules in some countries, such as Singapore and Korea.While the Markov property in the transition probability appearing in certain BMSs is convenient for the analysis of BMSrelated problems (e.g., Taylor 1997; Pitrebois, Denuit, and Walhin 2003a; Denuit et al. 2007), it is obvious that with the current definition of the transition rules are not Markovian as depends not only on but also on the numbers of claims in the past multiple periods. To resolve this issue, as explained in Lemaire (1995) and Pitrebois, Denuit, and Walhin (2003a), fictitious levels can be included to redefine the state space of the BM levels. Subdividing some of the BM levels in order to include the information of the number of claimfree years results in an increase of the number of states versus the classical case. More precisely, for the BM level is augmented into (\ell)_0, (\ell)_1, \ldots, (\ell)_{pen}, where the subscript stands for the number of additional claimfree periods (compared with the classical BMS) required to get rewarded, while is just translated into without augmentation.^{[2]} Hence, the original BM levels in the system have been augmented to BM levels. Let us denote the BM level at time under the augmented system as Then, it is convenient to define the possible combinations of in our proposed model (that is, the state space) as
\scriptsize \mathcal{A}_{z, pen}:=\{ (\ell)_0\big\vert \ell=0, \ldots, h1\}\cup \{ (\ell)_0, \ldots, (\ell)_{pen} \big\vert \ell=h, \ldots, z \}.\tag{8}
Furthermore, suppose that a policyholder who was at the BM level
at time has reported at time that accidents occurred during the th policy year. Then the new BM level at time is determined as\small L_t^*:= \begin{cases} ( \max\{\ell1, 0 \})_0, & \hbox{if}\quad N_{t}=0 \,\,\hbox{and}\,\,a=0,\\ ( \ell)_{a1}, & \hbox{if}\quad N_{t}=0 \,\,\hbox{and}\,\,a\neq 0,\\ (\min\{\ell+h N_{t}, z \})_{pen}, & \hbox{if}\quad N_{t}>0,\\ \end{cases}\tag{9}
for \zeta^*\left( (\ell)_a \right):=\zeta(\ell),\qquad (\ell)_a\in \mathcal{A}_{z, pen}.\tag{10} We call the BMS with BM levels (8), transition rules (9), and relativities (10) an augmented system. The following example illustrates the transition rules with a certain penalty period.
where a new policyholder without driving history starts with The transitions of are now Markovian, as depends on and the latest claim number only. Also, we assume that the BM relativity in state under the augmented system defined via (9) is the same as the one in for the original system defined in (7) in the sense that the relativities depend on the BM level but not the information that is artificially introduced to make the transitions Markovian. In other words, under the proposed model one has the relativitiesExample 1. Let us consider the Table 1. (Note that with (6) implies that stays level at 2 once reaches 2.) The first few transitions are explained as follows:
system with That is, whereas the BM level is to be increased by two per claim, having consecutive claimfree years is to be rewarded by a decrease of one BM level. We assume that a new policyholder without driving history starts at BM level (so at time 0). Given the BM level at time and the claim frequency in the th year, we summarize the next year’s BM level according to (7) and alternatively according to (9) in
Transition from
to :
At time 0, one has from (6), and therefore claimfree year is required to reduce the BM level by one at time 1 (like the classical BMS), and so we set at time 0. As it turns out that there is no claim reported at (i.e., the BM level is reduced from to (and thus at Now according to (6), meaning that a total of consecutive claimfree years are needed to reduce the BM level by one the next year at time 2. Since there has already been one claimfree year, only one more claimfree year is needed to earn such a reward (again like the classical BMS), and therefore one sets at time 1. 
Transition from
to :
Since again, the condition of reward is satisfied and the BM level moves down to (and at time 2. With at time 2, it is known that consecutive claimfree years are needed to lower the BM level by one. With two consecutive claimfree years already in hand, we need only one more claimfree year to reduce the BM level in the next year, and therefore at time 2. 
Transition from
to :
Since in the third year, the BM level moves up to 12 at time 3 (and so As at time 3, one needs consecutive claimfree years to reduce the BM level by one, which means two claimfree years in addition to the one claimfree year required in the classical BMS. Consequently, we have at 
Transition from
to :
Because in the fourth year, the BM level increases to at time 4. As there is a claim in this period, three consecutive claimfree years are still needed to reduce the BM level by one. So, we have at 
Transition from
to :
Although there is no claim in the fifth year as the condition of having three consecutive claimfree years in order to enjoy a reward is not satisfied, and therefore the BM level stays at the same level as last year, so that Then two more consecutive claimfree years are required to lower the BM level by one. This is one additional year compared with the classical system, so at
Example 2. Suppose that we have the
system with We consider Model 1 where a policyholder, say the th policyholder, has observed and unobserved risk characteristics and respectively. Then, based on the augmented form of BM levels defined in (8), one finds a total of BM levels with the transition matrix given by\[\tiny\begin{align} &\begin{array}{lllllllllllllll} & \quad & (0)_0 & (1)_0 & (2)_0 & (2)_1 & (3)_0 & (3)_1& (4)_0 & (4)_1& (5)_0 & (5)_1& (6)_0 & (6)_1& (7)_0 & (7)_1 \end{array} \\ &\begin{array}{l} (0)_0\\ (1)_0\
where p_n in the matrix is defined by (see (B.1)) p_n:=\mathbb{P}(N_{it}=n\Theta_i=\theta_i, \mathbf{X}_i=\mathbf{x}_i), \qquad n=0,1,\ldots. For simplicity, the dependence of on and is suppressed. Similarly, if we assume Model 2 instead and the th policyholder has observed and unobserved risk characteristics given by and then in the matrix is (see (C.1)) p_n:=\mathbb{P}(N_{it}=n\Theta^{[1]}_i=\theta^{[1]}_i, \mathbf{X}^{[1]}_i=\mathbf{x}^{[1]}_i), \qquad n=0,1,\ldots.
3.2. Optimal relativities
In this section, under a
system, the optimal values of the relativities for are studied. Following the classical BMS literature, we denote the stationary distribution of by for a randomly picked policyholder. Then, under the representation in Model 1, one finds in an analogous manner to (3) that\begin{align} \mathbb{P}(L^*=&(\ell)_a) =\sum_{k=1}^\mathcal{K} w_{k} \int \pi_{(\ell)_a}^*(\lambda_k\theta, \psi) g(\theta){\rm d}\theta,\\&(\ell)_a\in \mathcal{A}_{z, pen},\end{align}\tag{11}
where \mathbf{\pi}^*(\lambda_{k}\theta,\psi):=(\pi_{(0)_0}^*(\lambda_{k}\theta,\psi), \pi_{(1)_0}^*(\lambda_{k}\theta,\psi),\ldots, \pi_{(7)_1}^*(\lambda_{k}\theta,\psi)), can be obtained as the solution of \begin{cases} \mathbf{\pi}^*(\lambda_{k}\theta,\psi) = \mathbf{\pi}^*(\lambda_{k}\theta,\psi)\mathbf{P},\\ \mathbf{\pi}^*(\lambda_{k}\theta,\psi)\mathbf{e}_{14}=1,\\ \end{cases} where is the transition matrix in Example 2 (calculated with and in place of and respectively), and is a 14dimensional column vector of ones.
is the stationary distribution for a policyholder to be in BM level given that his or her a priori claim frequency is and unobserved risk characteristics are summarized in (Again, and are replaced by and respectively, if we consider Model 2 instead.) For example, the 14dimensional row vector of stationary probabilities corresponding to Example 2, namelyUnder the augmented system, we consider the optimal relativities as the solution of the optimization problem \small (\tilde{\zeta}(0), \ldots, \tilde{\zeta}(z)):=\mathop{\mathrm{arg\,min}}_{({\zeta}(0), \ldots, {\zeta}(z))\in \mathbb{R}^{{z+1}}}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}^*(L^*))^2]\tag{12} resembling (4), where we aim to predict the frequency under the frequencyonly Model 1. On the other hand, when we are interested in the prediction of the aggregate claim, like (5), the optimal relativities are given by the solution of the optimization problem \scriptsize (\tilde{\zeta}(0), \ldots, \tilde{\zeta}(z)):=\mathop{\mathrm{arg\,min}}_{({\zeta}(0), \ldots, {\zeta}(z))\in \mathbb{R}^{{z+1}}} \mathbb{E}[(\Lambda^{[1]}\Lambda^{[2]}\Theta^{[1]}\Theta^{[2]}\Lambda^{[1]}\Lambda^{[2]} \zeta^*(L^*))^2]\tag{13} under the frequencyseverity Model 2. Recall that, by definition, in (12) or (13) is completely characterized by for (see (10)).
Proposition 1. Consider the \tilde{\zeta}(\ell):=\frac{\mathbb{E}[\Lambda^2 \Theta  L^*=(\ell)_0]} {\mathbb{E}[\Lambda^2  L^*=(\ell)_0]}, \qquad \ell= 0, \ldots,{h1},\tag{14} and
system under Model 1. The optimal relativities defined as the solution of the optimization problem (12) are given by\begin{align} \tilde{\zeta}(\ell):=& \frac{\sum_{a=0}^{pen}\mathbb{E}[\Lambda^2 \Theta L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)} {\sum_{a=0}^{pen}\mathbb{E}[\Lambda^2  L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)}, \\ \ell =& h, \ldots,{z}; a=0, \ldots, pen,\end{align}\tag{15}
where the numerator and denominator can be calculated based on \begin{align} \mathbb{E}&[\Lambda^2 \Theta  L^*=(\ell)_a]\\ &= \frac{\sum_{k=1}^\mathcal{K} w_{k} \lambda_{k}^2 \int \theta \pi_{(\ell)_a}^*(\lambda_{k}\theta, \psi) g(\theta){\rm d}\theta} {\mathbb{P}(L^*=(\ell)_a)}\end{align}\tag{16} and
\small \mathbb{E}[\Lambda^2  L^*=(\ell)_a] = \frac{\sum_{k=1}^\mathcal{K} w_{k} \lambda_{k}^2 \int \pi_{(\ell)_a}^*(\lambda_{k}\theta, \psi)g(\theta){\rm d}\theta} {\mathbb{P}(L^*=(\ell)_a)},\tag{17} respectively. Note that when the optimal relativity reduces to that in Tan et al. (2015) given in Lemma 1.
Proof. Note that the objective function in (12) can be written as
\small\begin{align} \mathbb{E}&[(\Lambda\Theta\Lambda {\zeta}^*(L^*))^2]\\&=\sum_{(\ell)_a\in \mathcal{A}_{z, pen}}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}^*((\ell)_a))^2L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)\nonumber\\ &=\sum_{(\ell)_a\in \mathcal{A}_{z, pen}}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}(\ell))^2L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a),\end{align}\tag{18}
where the last line follows from (10). We observe from (8) that

for each
the relativity is shared by only one BM level and 
for each
the same relativity is shared by the augmented BM levels
Consequently, (18) becomes \small\begin{align} \mathbb{E}&[(\Lambda\Theta\Lambda {\zeta}^*(L^*))^2] \\&=~\sum_{\ell=0}^{h1}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}(\ell))^2L^*=(\ell)_0]\mathbb{P}(L^*=(\ell)_0)\nonumber\\ &+\sum_{\ell=h}^z\sum_{a=0}^{pen}\mathbb{E}[(\Lambda\Theta\Lambda {\zeta}(\ell))^2L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a).\end{align}\tag{19}
For each fixed
differentiation of (19) with respect to for optimization yields\mathbb{E}[2\Lambda(\Lambda\Theta\Lambda\zeta(\ell))L^*=(\ell)_0]\mathbb{P}(L^*=(\ell)_0)=0,
from which (14) follows. On the other hand, when
taking the derivative with respect to in (19) leads to\sum_{a=0}^{pen}\mathbb{E}[2\Lambda(\Lambda\Theta\Lambda\zeta(\ell))L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)=0,
proving (15).
Here, the numerators of (14) and (15) can be calculated using
\tiny\begin{aligned} \mathbb{E}&[\Lambda^2\Theta L^*=(\ell)_a] \\&=~\frac{1}{\mathbb{P}(L^*=(\ell)_a)} \mathbb{E}[\Lambda^2\Theta I(L^*=(\ell)_a)] \nonumber\\ &=~\frac{1}{\mathbb{P}(L^*=(\ell)_a)} \sum_{k=1}^\mathcal{K}\int \lambda_k^2\theta\mathbb{P}(\Theta\in {\rm d}\theta, \Lambda=\lambda_k,L^*=(\ell)_a)\nonumber\\ &=~\frac{1}{\mathbb{P}(L^*=(\ell)_a)} \sum_{k=1}^\mathcal{K}\int \lambda_k^2\theta\mathbb{P}(L^*=(\ell)_a\Theta\in {\rm d}\theta, \Lambda=\lambda_k) \mathbb{P}(\Theta\in {\rm d}\theta, \Lambda=\lambda_k).\end{aligned}\tag{20}
Since the a priori
and the a posteriori are independent and one has that Using this together with the fact that confirms that (20) reduces to (16). The expectation (17) can be obtained in an almost identical manner, and the details are omitted.Proposition 2. Consider the
system under Model 2. The optimal relativities defined as the solution of the optimization problem (13) are given by\begin{align}\tilde{\zeta}(\ell)&:=\frac{\mathbb{E}[(\Lambda^{[1]}\Lambda^{[2]})^2 \Theta^{[1]}\Theta^{[2]}L^*=(\ell)_0]} {\mathbb{E}[(\Lambda^{[1]} \Lambda^{[2]})^2 L^*=(\ell)_0]},\\ \ell&=0, \ldots,{h1},\end{align}
and
\begin{align}\tilde{\zeta}(\ell)&:=\frac{\sum_{a=0}^{pen}\mathbb{E}[( \Lambda^{[1]}\Lambda^{[2]})^2 \Theta^{[1]}\Theta^{[2]} L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)} {\sum_{a=0}^{pen}\mathbb{E}[(\Lambda^{[1]} \Lambda^{[2]})^2 L^*=(\ell)_a]\mathbb{P}(L^*=(\ell)_a)}, \\ \ell&=h, \ldots,{z}; a=0, \ldots, pen,\end{align}
where the numerator and denominator can be calculated based on
\scriptsize\begin{align}\mathbb{E}&[( \Lambda^{[1]}\Lambda^{[2]})^2 \Theta^{[1]}\Theta^{[2]} L^*=(\ell)_a]\\&=\frac{\sum_{k=1}^\mathcal{K} w_{k} (\lambda_{k}^{[1]}\lambda_{k}^{[2]})^2 \int\int \theta^{[1]}\theta^{[2]} \pi_{(\ell)_a}^*(\lambda_{k}^{[1]}\theta^{[1]},\psi^{[1]}) h(\theta^{[1]}, \theta^{[2]})\rm d\theta^{[1]}\rm d \theta^{[2]}} {\mathbb{P}(L^*=(\ell)_a)}\end{align}
and \scriptsize\mathbb{E}[( \Lambda^{[1]}\Lambda^{[2]})^2 L^*=(\ell)_a] = \frac{\sum_{k=1}^\mathcal{K} w_{k} (\lambda_{k}^{[1]}\lambda_{k}^{[2]})^2 \int \pi_{(\ell)_a}^*(\lambda_{k}^{[1]}\theta^{[1]},\psi^{[1]})g_1(\theta^{[1]}){\rm d}\theta^{[1]}} {\mathbb{P}(L^*=(\ell)_a)}, respectively. Note that when the optimal relativity reduces to that in Oh, Shi, and Ahn (2020) given in Lemma 2).
Proof. The proof of Proposition 2 is similar to that of Proposition 1, and thus we omit the details.
3.3. Numerical analysis
Here we provide a simple example to examine the effect of the period of penalty on the optimal BM relativity
and the stationary probability First, it is convenient to introduce the relation\mathbb{P}(L=\ell)=\sum_{\{a(\ell)_a\in \mathcal{A}_{z, pen}\}} \mathbb{P}(L^*=(\ell)_a),\tag{21}
which follows from the definition of
The performances of our modified BM transition rules under a system as well as a system for are compared by computing the hypothetical mean squared error (HMSE). For Model 1 (the frequency random effects model) with the set of BM relativities the HMSE is expressed as\scriptsize\begin{align} {\rm HMSE}(\mathbf{\zeta}) &:=\mathbb{E}[(\Lambda\Theta\Lambda \zeta^*(L^*))^2]\nonumber\\ &=\sum_{k=1}^\mathcal{K}\sum_{(\ell)_a\in \mathcal{A}_{z, pen}} w_k \int (\lambda_k \theta\lambda_k \zeta(\ell))^2 \pi_{(\ell)_a}^*(\lambda_k\theta,\psi) g(\theta) {\rm d}\theta.\end{align}\tag{22}
If
is replaced by the vector of optimal relativities calculated using Proposition 1, then is the minimized value corresponding to the righthand side of the optimization problem (12).Similarly, for Model 2 with the set of BM relativities Oh, Shi, and Ahn (2020), the HMSE is expressed as
following the logic in\tiny\begin{align}{\rm HMSE}(\mathbf{\zeta}) &:=\mathbb{E}[(\Lambda^{[1]}\Lambda^{[2]}\Theta^{[1]}\Theta^{[2]}\Lambda^{[1]}\Lambda^{[2]} \zeta^*(L^*))^2]\\ &=\sum_{k=1}^\mathcal{K}\sum_{(\ell)_a\in \mathcal{A}_{z, pen}} w_k \int \int \big(\lambda_k^{[1]}\lambda_k^{[2]} \theta^{[1]}\theta^{[2]} \lambda_k^{[1]}\lambda_k^{[2]} \zeta(\ell) \big)^2 \pi_{(\ell)_a}^*(\lambda_k\theta,\psi^{[1]}) h(\theta^{[1]}, \theta^{[2]}) {\rm d}\theta^{[1]} \rm d \theta^{[2]}.\end{align}
Again,
is the minimum on the righthand side of the optimization problem (13) with calculated using Proposition 2.Example 3. Under the frequency model with random effects described as Model 1, it is assumed that there is only one risk class, where the frequency
follows a Poisson distribution with a priori frequency (so that the distribution function can be put in the form of (B.1)) and the random effect in (B.2) has a lognormal distribution with mean 1 and that is,\small\begin{aligned} N(\Lambda=\lambda,\Theta=\theta) &\sim {\rm Poisson} (\lambda\theta) \qquad {\rm with} \quad \lambda=\lambda_0,\\ \Theta &\sim {\rm Lognormal}(\sigma^2/2, \sigma^2) \qquad {\rm with} \quad \sigma^2=0.99.\end{aligned}
With Tables 2(a) and (b).
and the stationary probabilities and the optimal BM relativities for all ’s and the resulting HMSE (under the optimal relativities) are calculated under a system and a system. The results when are first summarized inIn obtaining the tables, we first construct the transition probability matrix according to the transition rules of the BMS and the claim frequency distribution with the assumed parameters (see Example 2) and obtain the stationary probability
as explained at the beginning of Section 3.2. Then, can be calculated from (11) and (21). Note that integration with respect to under the distributional assumption on the density of is needed in (11). While explicit evaluation of such a onedimensional integral is not possible in general, numerical integration can be implemented easily in most computing software. In turn, the optimal relativities can be obtained from Proposition 1, where numerical integration is again performed based on (16) and (17) (but the integral appearing in (17) is the same as that in (11)). HMSE can be similarly calculated via (22) once one has calculated the optimal relativitiesWe first look at the Table 2(a). For each fixed value of most of the population is in the lowest BM level 0 (as is at least which can be explained by the relatively low mean claim frequency of By inspecting for we observe that an increase in tends to move some of the population toward higher BM levels, leading to a diversification of the stationary BM levels. This is because a higher means that more claimfree years are needed to reduce the BM level by one, rendering it more difficult for drivers at a high BM level to move downward. As mentioned in Proposition 1, the optimal relativities under are the same as those in Tan et al. (2015). As increases, it is important to observe that the optimal relativity for each fixed BM level decreases. Although a lower BM relativity means that a driver occupying a given BM level pays a less expensive premium in a system with a higher the insurer’s premium income is in turn compensated by an increased portion of drivers occupying higher BM levels as increases. A higher also results in an improvement of the HMSE in this example. Similar patterns can be found in the system as shown in Table 2(b). However, for a fixed the optimal relativity in Table 2(b) is not necessarily increasing in especially for low BM levels. For example, it is observed that is slightly smaller than With a possible explanation for is that certain policyholders occupying BM level 2 could be those who were in the lowest BM level 0 (and had one claim) in the previous year, and these are still good drivers. On the other hand, those in BM level 1 could not have come directly from BM level 0. It is interesting to note that although is not always increasing in in Table 2(b), reporting claims moves a policyholder upward by BM levels and this always leads to an increase in Nevertheless, potential problems arise when a policyholder moves down from BM level 2 to BM level 1 after a number of claimfree years and then suffers an increase in the optimal relativity (and hence premium). For future work, constrained optimization may be considered by requiring to be increasing in but it is unlikely that this will lead to optimal relativities that admit expressions as explicit as those in Propositions 1 and 2.
system inWhen Table 2(c) shows some similar features: as increases, decreases and the distribution (at equilibrium) of policyholders shifts from lower BM levels to higher ones. However, since a significant fraction of the population is already in the highest BM level 14 when increasing antidiversifies the stationary distribution of BM levels. We observe that HMSE is increased with higher
we consider BM levels up to because the assumption of a higher mean claim frequency will result in a larger proportion of policyholders occupying higher BM levels. The system inWe also notice one interesting phenomenon when comparing the cases where the claim frequencies are of different magnitude. In Tables 2(a) and (b) for which the optimal relativity for BM level 0 is still at least 69% while that for the highest BM level can be as high as 17. An intuitive explanation is that given the low claim frequency it is not unusual that drivers do not file claims. As a result, claimfree drivers (those occupying the lowest BM level) are not necessarily much better drivers than others, and therefore they are not rewarded with much premium discount. On the other hand, drivers occupying the highest BM level must have reported some claims in the past years, and they are more likely to be worse drivers with more risky unobserved risk characteristics (that is, higher and thus are more severely penalized. Moving to Table 2(c) for which it becomes much more common for drivers to report claims and occupy higher BM levels, so they are not penalized too much with the highest BM relativity around 2. But it is those at lower BM levels who have claimfree years that should be regarded as significantly better drivers, and they are given a great discount with the BM relativity no larger than 24.6%.
From this example, we conclude that while the extension of the classical
system to the system provides room for improvement of the predictive power via lowering the HMSE, such an improvement is not always guaranteed.Example 4. Under the collective risk model with random effects described as Model 2, we assume one risk class only, where the distribution function
in (C.1) for the frequency part follows a Poisson distribution with and the distribution function in (C.1) for the severity part follows a gamma distribution with and The marginal distributions of the random effects are specified as\begin{cases} \Theta^{\lbrack1\rbrack} ~\sim &{\rm Lognormal}(\sigma_{1}^{2}/2, \sigma_{1}^{2}) \qquad {\rm with} \quad \sigma_{1}^{2}=0.99,\\ \Theta^{\lbrack2\rbrack} ~\sim &{\rm Lognormal}(\sigma_{2}^{2}/2, \sigma_{2}^{2}) \qquad {\rm with} \quad \sigma_{2}^{2}=0.29, \end{cases}
with distribution functions Table 3(a)) as well as and (Table 3(c)) and under a system for the case and (Table 3(b)). In constructing Table 3, the parameters and are kept fixed. Note that the values of HMSE in Table 3 are much higher than those in Table 2 concerning Example 3, and that is simply because the HMSE in the present example measures the error concerning the aggregate claim (and the individual severity has large magnitude as whereas the HMSE in Example 3 concerns the error in the claim number. Moreover, some of the optimal relativities in Table 3 are quite different from those in Table 2, showing the importance of taking into account the dependence between claim frequency and claim severity for modeling purposes. Nonetheless, similar patterns to those in Table 2 are observed in Table 3. Note also that the stationary probabilities in Table 3 are identical to their counterparts in Table 2 because the probabilities depend only on the distribution of the claim frequency.
and respectively, and the dependence is described by a Gaussian copula with correlation coefficient (so that the joint distribution function is given by in (C.4)). With we have calculated the values of the stationary probabilities and the optimal BM relativities for all ’s together with the HMSE under a system for the cases and (4. Data analysis
4.1. Summary of data estimation results
In this subsection, we summarize the data estimation results using real data. To examine the effect of dependence on ratemaking, we employ a data set concerning collision coverage for new and old vehicles from the Wisconsin Local Government Property Insurance Fund (LGPIF) (Frees, Lee, and Yang 2016)—detailed information on the project can be found at the LGPIF project website. Such collision coverage provides coverage for impact of vehicle with an object, impact of vehicle with an attached vehicle, and overturn of a vehicle. The observations include policyholders who have either new collision coverage or old collision coverage, or both. In our data analysis, longitudinal data over the policy years from 2006 to 2010 with 497 governmental entities are used. There are two categorical variables: the entity type with six levels, and the coverage with three levels as shown in Table 4 (which is the same as Table C.6 in Oh, Shi, and Ahn (2020)).
Specifically, for Model 1 in the data analysis, as in (B.1) we consider \small N_{it}(\Theta_i=\theta_i,\mathbf{X}_{i}=\mathbf{x}_{i})\sim {\rm Poisson}(\lambda_i\theta_i)\quad\hbox{with}\quad \lambda_i:=\exp(\mathbf{x}_i\mathbf{\beta}), where is a row vector containing the th policyholder’s information of the categorical variables in Table 4 for the frequency, and the column vector contains the corresponding parameters. For the random effect we assume a lognormal distribution with mean 1 and variance of that is, \Theta \sim {\rm Lognormal}(\sigma^2/2, \sigma^2). Summary statistics of the posterior samples for the parameters in Model 1 using the Bayesian approach are presented in Table 5. The table includes the posterior median (Est), the posterior standard deviation (Std.dev), and the 95% highest posterior density Bayesian credible interval (95 CI). Note that an asterisk, indicates that the parameters are significant at the 0.05 level. In estimating the parameters in Table 5, we have run 30,000 Markov chain Monte Carlo (MCMC) iterations saving every fifth sample after a burnin of 15,000 iterations by using the JAGS program. Standard MCMC diagnostics gave no indication of lack of convergence. For Model 2, in line with (C.1) and (C.2), we assume N_{it}(\Theta_i^{[1]}=\theta_i^{[1]},\mathbf{X}_{i}^{[1]}=\mathbf{x}_{i}^{[1]})\sim {\rm Poisson}(\lambda_i^{[1]}\theta_i^{[1]}) for the frequency, and Y_{itj}(\Theta_i^{[2]}=\theta_i^{[2]}, \mathbf{X}_{i}^{[2]}=\mathbf{x}_{i}^{[2]}) \sim {\rm Gamma}(\lambda_i^{[2]}\theta_i^{[2]},1/\psi^{[2]}) for the individual claim severity, where is the mean and is the shape parameter of the gamma distribution. With log link, we assume \lambda_i^{[1]}=\exp(\mathbf{x}_i^{[1]}\mathbf{\beta}^{[1]}) \quad \hbox{and} \quad \lambda_i^{[2]}=\exp(\mathbf{x}_i^{[2]}\mathbf{\beta}^{[2]}), where we take and and are the corresponding parameters.
For the bivariate random effects
we assume a Gaussian copula in (C.4) with correlation coefficient and the marginal distributions are assumed to follow lognormal distributions with different parameters specified as\begin{cases} \Theta^{\lbrack1\rbrack} ~\sim& {\rm Lognormal}(\sigma_{1}^{2}/2, \sigma_{1}^2),\\ \Theta^{\lbrack2\rbrack} ~\sim& {\rm Lognormal}(\sigma_{2}^{2}/2, \sigma_{2}^2). \end{cases}
The estimation results from Oh, Shi, and Ahn (2020) are summarized in Table 7, and we refer interested readers to Oh, Shi, and Ahn (2020) for the details of the estimation procedure and further results. In particular, the dependence parameter of the Gaussian copula is estimated to be suggesting a significant negative dependence between the random effects and of the frequency and the severity, respectively.
4.2. Analysis of optimal relativities in a modified BMS
Using the posterior median as estimates of the parameters in Models 1 and 2 when 6 and 8. We remark that unlike in Examples 3 and 4, where the stationary probabilities are identical across Tables 2 and 3, the stationary probabilities in Tables 6 and 8 are close but not identical. The reason is that the frequency parameters for Model 2 in Table 7 are estimated jointly with the severity parameters and therefore the results are slightly different from those for Model 1 given in Table 5, where only frequency parameters are estimated. Note also that the parameters are estimated using MCMC sampling, which has also possibly contributed to some differences.
we calculate the optimal relativity and stationary probability for each BM level as well as the values of HMSE under the modified BM transition rules for a system and a system when The results are summarized in TablesComparing across Tables 6 and 8, one can observe the impact of frequencyseverity dependence on the optimal relativities. In particular, under negative dependence between frequency and severity, which is implied from the negative dependence between the two random effects with the optimal relativities tend to decrease for higher BM levels but increase for lower BM levels. Consequently, the optimal relativities in Table 8 are less spread out than those in Table 6. It is noted that all optimal BM relativities in Table 8 are less than 1, which may be counterintuitive at first glance. However, it is noted that, roughly speaking, may be regarded as an estimate of when for some (see (10) and the optimization (13) in Model 2). Since and are negatively dependent (with both having mean 1), when one of them is large the other is more likely to be small, making the product unlikely to be larger than 1.
Both Tables 6 and 8 confirm that as increases, each optimal relativity (for fixed decreases and the proportion of policyholders in higher BM levels increases. Explanations similar to those in Examples 3 and 4 are applicable. A certain degree of diversification effects on the BM levels as increases resembling Tables 2(a) and (b) in Example 3 and Tables 3(a) and (b) in Example 4 can be observed from Tables 6 and 8. Meanwhile, the HMSE values show that a higher leads to a decrease in predictive power in this example. The higher HMSE in Table 8 compared with Table 6 is again attributed to the fact that Model 2 is concerned with the aggregate claim while Model 1 is about claim number, and interested readers are referred to Tables A4 and A5 in Oh, Shi, and Ahn (2020) for the magnitude of the average claim severity in Model 2.
Lastly, we can compare the stationary distribution Tables 6(a) and (b) concerning Model 1, the presence of a period of penalty leads to a smaller proportion of policyholders in the lowest level 0 (with in the former model and in the latter) but a larger proportion in the highest level 14 (with in the former model and in the latter). For a given BM level the optimal relativity in the former model is always lower than the latter, suggesting that a BMS with a period of penalty instead of a higher increase of the BM level per claim can look more attractive to the market if the relativities are available to potential customers. Such an effect is even more pronounced if one assumes instead of Hence, this numerical example clearly illustrates that we can expect different outcomes in terms of the distribution of the BM level as well as the optimal relativities depending on how more rigid the BMS is designed. Instead of (or in addition to) imposing a higher increase of the BM level per claim, our model makes it harder for a policyholder to transit to lower BM levels once he or she has reported a claim by requiring consecutive and multiple claimfree years to enjoy a bonus. We also remark that the same conclusion can be drawn from Tables 8(a) and (b) by comparing the (or system and the system under Model 2.
of the BM level and the optimal relativities under two different BM systems: (i) a system with a small increase of BM level per claim but a period of penalty and (ii) a system with a larger increase of BM level per reported claim but without a period of penalty From5. Conclusion
In automobile thirdparty liability insurance, the BMS has been broadly used as an a posteriori ratemaking mechanism to help insurers rate a policyholder’s risk more accurately and thus calculate premium more fairly to reflect policyholder risk. The BMS is also designed to stimulate drivers to practice safer driving by adjusting the premium on policy renewal based on the claim history in the previous year. However, a typical BMS that immediately offers a reward to policyholders without a claim in the previous year may tend to move policyholders easily toward lower BM levels. Hence, to resolve the imbalance due to a concentration of policyholders in the lowest BM level, prevent a quick recovery (in terms of paying lower premium) for those drivers who have a good claim history for only a single period, and better distinguish between drivers who are consistently good and those who are only temporarily good, in this paper we introduce a “period of penalty” to count the number of consecutive claimfree years required to lower the BM level. Although other more rigid BMS transition rules such as a
system with or to severely penalize claims in the previous year may put less pressure on the premium income of the insurer, they may weaken the product’s competitiveness in the market and insurers may lose better customers. Through numerical illustrations with a data set, we demonstrate that compared with a system, the introduction of a penalty period to a system can result in (i) lower values of optimal BM relativities, which can potentially improve marketability of the product; and (ii) better separation of risks because the BM level 0 is less concentrated as drivers who are not consistently good are moved to higher BM levels. Once policyholders are aware of the requirement of consecutive claimfree years to enjoy lower BM relativities at lower BM levels, they are likely better motivated to drive more safely.Moreover, in the aforementioned modified BMS with a penalty period, we take into account the randomness of (i) frequency only or (ii) both frequency and severity when modeling the unobserved risk characteristics and constructing the optimal relativities associated with the BM levels. Those relativities are then used to determine the premium actually charged to the policyholders staying in the corresponding BM levels. We hope that our extended BMS allowing for dependency between the frequency and the severity of claims via dependency of the unobserved risk characteristics provides some insights for improving the classical BMS, in particular, when the insurers have the freedom to alter the features of the BMS to reflect claim experience more accurately based on the claim history of policyholders in multiple years with product competitiveness in mind.
Acknowledgments
The authors would like to thank the anonymous reviewers for helpful comments and suggestions that improved earlier versions of the manuscript. This research was supported by a Committee on Knowledge Extension Research grant funded by the Casualty Actuarial Society (CAS). The views expressed herein are those of the authors and are not necessarily those of the CAS.