Loading [MathJax]/jax/element/mml/optable/SuppMathOperators.js
Xia, Michelle, Lei Hua, and Gary Vadnais. 2018. “Embedded Predictive Analysis of Misrepresentation Risk in GLM Ratemaking Models.” Variance 12 (1): 39–58.
Download all (10)
  • Figure 1. Logarithm of loss amount log(Y) by reported smoking status V* under lognormal ratemaking models, when comparing individuals with same other risk characteristics x.
  • Figure 2. Credible intervals for the risk effects of V (top) and X (bottom) for the gamma loss severity model. The dashed line marks the true value.
  • Figure 3. 95% credible intervals for the prevalence of misrepresentation q for the gamma loss severity model.
  • Figure 4. Credible intervals for the risk effects of V1 (top) and V2 (bottom) for the negative binomial loss frequency model. The dashed line marks the true value.
  • Figure 5. Proposed 95% credible intervals for the probabilities p1 and p2 for the negative binomial loss frequency model.
  • Figure 6. Credible intervals for the risk effects of V (top) and X (bottom) for the Poisson loss frequency model. The dashed line marks the true value.
  • Figure 7. Proposed 95% credible intervals for the risk effect β1 on the prevalence of misrepresentation for the Poisson loss frequency model.
  • Figure 8. Credible intervals for the relativity of smoking and age, exp(α1) and exp(α2), for the negative binomial model on the office-based visits (left column) and the gamma model on total medical charges (right column). The age effect corresponds to the increase of age by one standard deviation (i.e., 12 years).
  • Figure 9. Adjusted credible intervals for the age effect exp(β1) on the odds of misrepresentation, the predicted misrepresentation probability p(ˉx), and the prevalence of misrepresentation q(ˉx) for individuals at the average age of 42. In each panel, the left column corresponds to the negative binomial model on office-based visits, and the right column corresponds to the gamma model on total medical charges.
  • Figure 10. Predicted misrepresentation probability q(x) by age for individuals who reported nonsmoking.

Abstract

Misrepresentation is a type of insurance fraud that happens frequently in policy applications. Due to the unavailability of data, such frauds are usually expensive or difficult to detect. Based on the distributional structure of regular ratemaking data, we propose a generalized linear model (GLM) framework that allows for an embedded predictive analysis on the misrepresentation risk. In particular, we treat binary misrepresentation indicators as latent variables under GLM ratemaking models for rating factors that are subject to misrepresentation. Based on a latent logistic regression model on the prevalence of misrepresentation, the model identifies characteristics of policies that are subject to a high risk of misrepresentation. The method allows for multiple factors that are subject to misrepresentation, while accounting for other correctly measured risk factors. Based on the observed variables on the claim outcome and rating factors, we derive a mixture regression model structure that possesses identifiability. The identifiability ensures valid inference on the parameters of interest, including the rating relativities and the prevalence of misrepresentation. The usefulness of the method is demonstrated by simulation studies, as well as a case study using the Medical Expenditure Panel Survey data.