Loading [MathJax]/jax/output/SVG/jax.js
Xu, Shuzhe, Vajira Manathunga, and Don Hong. 2023. “Framework of BERT-Based NLP Models for Frequency and Severity in Insurance Claims.” Variance 16 (2).
Download all (16)
  • Figure 1. BERT&NN model
  • Figure 2a. Scatter plot for neural network (NN) model prediction values versus actual
  • Figure 2b. Scatter plot for BERT&NN model prediction values versus actual
  • Figure 3a. Quantile-quantile plot for actual versus neural network (NN)–predicted severities
  • Figure 3b. Quantile-quantile plot for actual versus Bidirectional Encoder Representations from Transformers–predicted severities
  • Figure 3c. Quantile-quantile plot for actual versus gamma-predicted severities
  • Figure 3d. Quantile-quantile plot for actual versus lognormal-predicted severities
  • Figure 4a. Scatter plot for BERT&NN versus actual with outlier treatment from the original data set
  • Figure 4b. Scatter plot for neural network (NN) versus actual with outlier treatment from the original data set
  • Figure 5a. Quantile-quantile plot for BERT&NN with outlier treatment from the original data set
  • Figure 5b. Quantile-quantile plot for neural network (NN) versus actual with outlier treatment from the original data set
  • Figure 6a. Quantile-quantile plot for fitted gamma versus actual with outlier treatment from the original data set
  • Figure 6b. Quantile-quantile plot for fitted lognormal versus actual with outlier treatment from the original data set
  • Figure B.1. Observed versus predicted distribution for Poisson regression on the testing data set
  • Figure B.2. Observed versus predicted distribution for NB-2 regression on the testing data set
  • Figure D.1. Flowchart for an automated procedure of BERT-based frequency/severity prediction models

Abstract

It is challenging to incorporate textual information from insurance datasets for predictive modeling. We propose a framework for claim frequency and loss severity modeling based on a new natural language processing (NLP) technique, named BERT to extract textual descriptive information from claim records. Predictions are obtained using artificial neural networks (NN) for regression. Additionally, the shape of the predictive distribution is estimated and outlier treatment with corresponding data analysis is discussed. This research shows that BERT-based NN model provides a great possibility to outperform other models without using textual information in accuracy and stability when suitable textual data are available for modeling. This research outlines an automated procedure of BERT-based frequency-severity predictions for insurance claims.

Accepted: May 25, 2023 EDT