Guo, Jiayi, Linfeng Zhang, and Zhiyu Quan. 2026. “Entity-Specific Cyber Risk Assessment Using Insurtech-Empowered Risk Factors.” Variance 19 (February).
Download all (10)
  • Algorithm 1. Binary relevance with joint search for multilabel classification.
  • Algorithm 2. Classifier chain with joint search for multilabel classification.
  • Algorithm 3. Multilabel classification trees with joint search.
  • Figure 1. Multilabel classification model performance.
  • Figure 2. Multi-output regression model performance.
  • Figure 3. Log-transformed feature importance scores across various classification models.
  • Figure 4. Count of top important feature appearance across various classification models.
  • Figure 5. Feature importance scores across various regression models.
  • Figure 6. Count of top important feature appearance across various regression models.
  • Figure 7. Distributions of cyber incident occurrence and frequency across categories.

Abstract

The lack of high-quality public cyber incident data limits empirical research and predictive modeling for cyber risk assessment. This challenge persists because companies are reluctant to disclose incidents that could damage their reputation or investor confidence. From an actuarial perspective, potential resolutions include these: the enhancement of existing cyber incident datasets and the implementation of advanced modeling techniques to optimize the use of the available data. A review of existing data-driven methods highlights a significant lack of entity-specific organizational features in publicly available datasets. To address that gap, we propose a novel insurtech framework that enriches cyber incident data with entity-specific organizational features. We develop various machine learning (ML) models: a multilabel classification model to understand the occurrence of cyber incident types (e.g., privacy violation, data breach, fraud and extortion, IT error, and others) and a multi-output regression model to estimate their annual frequencies. While classifier and regressor chains are also implemented to explore dependencies among cyber incident types, no significant correlations are observed in our datasets. We also apply multiple interpretable ML techniques to identify and cross-validate potential risk factors developed by insurtech across ML models. We find that compared with conventional risk factors, insurtech-empowered features enhance occurrence and frequency estimation robustness. The framework generates transparent, entity-specific cyber risk profiles, supporting customized underwriting and proactive cyber risk mitigation. It provides insurers and organizations with data-driven insights to support decision-making and compliance planning.

Accepted: December 09, 2025 EDT