Loading [MathJax]/jax/element/mml/optable/BasicLatin.js
Gan, Guojun, and Christopher Shultz. 2026. “Using Large Language Models to Generate New Features from Text Data for Loss Prediction.” Variance 19 (June). https://doi.org/10.66573/001c.162423.
Download all (13)
  • Figure 1. Histograms of the log transformed loss.
  • Figure 2. Distribution of common keywords in claim descriptions.
  • Figure 3. UMAP dimensions 1 and 2 of claim description embeddings.
  • Figure 4. Schematic overview of using a pretrained LLM to generate new features.
  • Figure 5. A flowchart of entire data analysis process.
  • Figure 6. Box plots of log-loss by risk classes for the two-category approach produced by ChatGPT-4o and Llama-3.2-3B on the training set.
  • Figure 7. Box plots of log-loss by risk classes for the three-category approach produced by ChatGPT-4o and Llama-3.2-3B on the training set.
  • Figure 8. Box plots of log-loss by risk classes for the five-category approach produced by ChatGPT-4o and Llama-3.2-3B on the training set.
  • Figure 9. Box plots of log-loss by risk classes produced by ChatGPT-4o and Llama-3.2-3B on the test set.
  • Figure 10. Box plots of log-loss by risk classes produced by ChatGPT-4o and Llama-3.2-3B on the test set.
  • Figure 11. Box plots of log-loss by risk classes produced by ChatGPT-4o and Llama-3.2-3B on the test set.
  • Figure 12. Line plots of the mean log-loss against the risk levels.
  • Figure 13. Scatter plots of the observed loss and the loss generated by Llama-3.2-3B in the log scale.

Abstract

Insurance companies collect large volumes of unstructured text data, such as claims descriptions, adjuster notes, and customer feedback. While these data contain valuable context, their unstructured nature leads to underutilization in traditional actuarial models. This paper investigates the use of large language models (LLMs) to extract low-dimensional structured features from claims descriptions to improve loss prediction models. With structured prompts, we apply GPT-4o and Llama-3.2-3B to classify incidents into ordinal risk categories. The resulting labels are semantically meaningful, ordinally consistent, and predictive of average loss. Incorporating them into generalized linear models improves both in-sample fit and out-of-sample accuracy, with five-category labels from GPT-4o yielding the best performance. These results demonstrate that LLMs can effectively augment traditional actuarial risk models by extracting meaningful information from unstructured text data.

Accepted: April 14, 2026 EDT