Loading [MathJax]/jax/output/SVG/jax.js
Manathunga, Vajira A., and Duyen Hai Doan. 2026. “Predicting Workers’ Compensation Dispute Outcomes with Large Language Models.” Variance 19 (January).
Download all (15)
  • Figure 1. Research methodology to compare traditional NLP techniques vs. LLMs.
  • Figure 2. Prompting strategies for measuring robustness.
  • Figure 3. Impact of the independent variable (Issues vs. Facts) and thresholding on traditional NLP model performance.
  • Figure 4. Overall mean performance comparison of NLP and LLM techniques.
  • Figure 5. Performance metric distributions across prompts by LLMs for Facts.
  • Figure 6. Performance metric distributions across prompts by LLMs for Issues.
  • Figure 7. Performance metric distributions across prompts by LLMs for Facts, compared with traditional NLP techniques.
  • Figure 8. Performance metric distributions across prompts by LLMs for Issues, compared with traditional NLP techniques.
  • Figure 9. Metric distributions: CoT vs. simple prompts for Facts.
  • Figure 10. Metric distributions: CoT vs. simple prompts for Issues.
  • Figure 11. Methodology for testing the impact of preprocessing on LLMs.
  • Figure 12. Model performance comparison for anonymization versus (anonymization + preprocessing) for Issues.
  • Figure 13. Model performance comparison for anonymization versus (anonymization + preprocessing) for Facts.
  • Figure B.1. Input search criteria for “workers compensation” using the “Public” search form and “Deputy Commissioner [CS]” slice.
  • Figure B.2. Displayed results for “workers compensation” search query filtered by the specified criteria.

Abstract

Workers’ compensation insurance is one of the oldest social insurance programs in the United States, predating both Social Security and unemployment insurance. When disputes arise between employees and employers over benefit entitlements, most states require resolution through administrative boards. In this study, we evaluate whether large language models (LLMs) can predict the outcomes of workers’ compensation cases more accurately than traditional, domain-specific natural language processing (NLP) techniques under the zero-shot learning paradigm. We compare performance under two input scenarios: using only the initial “Issues” filed and using the full “Findings of Fact” narrative of each case, and we measure predictive accuracy against actual board decisions. Our results show that, with access to a sufficiently large context window, LLMs match or surpass the performance of specialized NLP pipelines despite having no task-specific training on workers’ compensation data. This finding underscores the practical utility of LLMs in case outcomes for the plaintiff, the employer, actuaries, and the insurance carrier.