Hua, Lei, and Maochao Xu. 2021. “Pricing Cyber Insurance for a Large-Scale Network.” Variance 14 (2).
• Table 1. Effects of network topology on risk assessments
• Figure 1. Infected nodes over time, cases A, B, and C
• Figure 2. Histograms of the natural logarithm of accumulated infected nodes × month, cases A, B, and C
• Figure 3. Infected nodes over time, cases D and E
• Figure 4. Histogram of the natural logarithm of accumulated infected nodes × month, cases D and E
• Figure 5. Scatter plots of standardized covariates
• Table 2. Estimates of the regression coefficients for Tinf; the reference level of the categorical variable is Ninf0 = 1
• Table 3. ANOVA analysis for Tinf
• Table 4. Estimates of the regression coefficients for Nrec
• Table 5. Type III analysis for Nrec
• Table 6. ANOVA table for Tinf: case study with all variables included
• Table 7. ANOVA table for Tinf: case study with the candidate model
• Table 8. Estimates for Tinf: case study with the candidate model
• Table 9. Type III analysis for Nrec: case study with all variables included
• Table 10. Type III analysis for Nrec: case study with candidate model
• Table 11. Estimates for Nrec: case study with candidate model
• Figure 6. Scatter plots for Tinf and Nrec in the case study
• Table 12. Predicted Tinf and Nrec and the expected total loss in the case study
• Figure 7. Illustration of elapsed time $\left\{ t_{j_{s}} \right\}$ and τ
• Figure 8. Evolution of infection and recovery

## Abstract

To address the lack of cyber insurance loss data, we propose an innovative approach to pricing cyber insurance for a large-scale network using synthetic data. The synthetic data is generated by the proposed risk-spreading and risk-recovering algorithm. The algorithm allows the sequential occurrence of infection and recovery events, and it allows the dependence of the random waiting time to infection for different nodes. The scale-free network framework is adopted to account for the uncertain topology of the random large-scale network. Extensive simulation studies are conducted to understand the risk-spreading and risk-recovering mechanism and to uncover the most important underwriting risk factors. A case study is also presented to demonstrate that the proposed approach and algorithm can be adapted to provide reference for cyber-insurance pricing.

This paper was funded by the 2017 Individual Grants Competition funded by the Society of Actuaries and the Casualty Actuarial Society.

Accepted: February 15, 2020 EDT

# Appendix

## A.1. Algorithms for simulating cyber risk spreading and recovering

First of all, we define an active link as a link that satisfies the following two conditions: (1) exactly two nodes are connected by the link; (2) exactly one of the two nodes is infected. For a given infected node j, its kth active link is indexed by jk. Assume that there are M infected nodes at time t, and the set of the indexes of the infected nodes is denoted as $ℐ$. Each infected node generates a recovery process with the random waiting time to recovery represented by Rj with the survival function ${\overline{\mathbf{G}}}_{\mathbf{j}}\left( \mathbf{r} \right)\mathbf{,\ j \in} ℐ$. Each active link generates an infection process with the random waiting time to infection represented by $\mathbf{T}_{\mathbf{j}_{\mathbf{k}}}$ with the survival function ${\overline{\mathbf{F}}}_{\mathbf{j}_{\mathbf{k}}}\left( \mathbf{t} \right)$, j = 1, . . . , M , and for each given j, k = 1, . . . , kj, where kj is the number of active links associated with the infected node j; define $𝒜_{\mathbf{j}}$ the set of active links associated with the jth infected node. Let N represent the total number of active links, and then $N = \sum_{j = 1}^{M}k_{j}$.

A reasonable assumption is that the infection processes from the same infected node are assumed to be dependent; that is, $T_{j_{1}}, \dots T_{j_{k_{j}}}$ are dependent and their dependence structure can be modeled by a copula C. Specifically, assume that the infected node j launches attacks (through active links) to its susceptible neighbors $\left\{ j_{1}, \dots, j_{k_{j}} \right\}$. Then $\left( T_{j_{1}}, T_{j_{2}}, \dots, T_{j_{k_{j}}} \right)$ is assumed to have the following joint survival function:

Then, the joint survival function for all the $T_{j_{k}}$s is

Figure 7. Illustration of elapsed time $\left\{ t_{j_{s}} \right\}$ and τ

Given the time $\left\{ t_{j_{k}} \right\},j \in ℐ, k \in 𝒜_{j}$ have elapsed since these active links have been activated, and the time {rj}, $j \in ℐ$ have elapsed since these nodes have been infected, the probability that the next infection corresponds to process ik and will occur at time $t_{i_{k}}$ + τ can be represented as follows, where $D_{i_{k}}\left( u_{1}, \dots,u_{k_{i}} \right) := \frac{\partial C_{i}\left( u_{1}, \dots ,u_{k_{i}} \right)}{\partial u_{k}}$:

where

and it represents the probability that no infection occurs in the next time period of length τ, given that the times $\left\{ t_{j_{s}} \right\}$ have elapsed for those active links, and the times $\left\{ r_{j} \right\}$ have elapsed for those infected nodes.

Similarly, the probability that the next recovery corresponds to the node i and will occur at time ri + τ can be represented as follows:

where gi is the density function of Gi. Given the occurrence time τ, the probability that the next occurring infection event belongs to process ik is

and the probability that the next recovery event belongs to node i is

Note that the Cj here is the copula for the active links associated with the infected node j. When there is only one active link, the copula is not required but the marginal survival function should be kept; that is, $C_{j}\left( {\overline{F}}_{j_{1}}\left( t_{j_{1}} + \tau \right) \right) \equiv {\overline{F}}_{j_{1}}\left( t_{j_{1}} + \tau \right)$. For the situation when an infected node, say j, does not have any active link, that is, $𝒜_{j} = \emptyset$, we let Cj $\equiv$ 1. The following Algorithm 1 illustrates how cyber risks spread and recover continuously on a network.

## A.2. Algorithm 1: Simulation based on exact networks

INPUT: Network topology A, time span of simulation study T, the iteration limit, initial states of nodes, initial elapsed time t0, dependence structures C(|θ), active link infection distribution, recovery distribution

1. for i = 1 to iteration limit do

1. while tT and the number of infected nodes > 0 do

1. Generate a u from $𝒰$ (0, 1), and then solve τ according to equation (A.2) $\Phi (\tau | {t_{j_s}, r_j}) = u;$ update the system time by adding the derived time τ.

2. For the active links and the infected nodes, calculate the $\lambda_{i_k}$ and Λ﻿i, respectively, based on equations (A.1) and (A.3), and thus the probabilities of equations (A.4) and (A.5).

3. Based on the probabilities derived above, randomly sample the index corresponding to which event—either infection or recovery—that occurs.

4. if Infection occurs then

1. Change the corresponding infected node status from 0 to 1; change the status of the corresponding active link from 1 to 0; and assign the newly infected node with elapsed time 0.
1. else
9. Change the corresponding recovered node status from 1 to 0.
1. end if

2. Update the matrix representing the active links; update the elapsed time for each active link and each infected node, respectively.

3. Update the current time t ← t + t1

4. return t, node status, matrix for the active links, elapsed time for the active links and for the infected nodes.

1. end while
1. end for

OUTPUT: Node status, matrix for the active links, elapsed time for the active links and for the infected nodes right after each infection or recovery event

The reason for Item 3 of Algorithm 1 is that $\Phi ( \tau | {t_{j_s}, r_j})$ represents the probability that no event occurs during the next time length τ. Therefore, $1 - \Phi ( \tau | {t_{j_s}, r_j})$ represents the probability that at least one event occurring during the next time length τ. Generate u from the uniform distribution $𝒰 (0, 1)$, and solve τ as follows to derive the time length:

Note that, given that an event happens, we then need to know which process it corresponds to, and this can be calculated based on equations (A.4) and (A.5).

In Figure 8, we illustrate the interaction of the infection process and the recovery process for a network with 10 nodes and 15 edges. We let the infection propagate and the infected nodes recover by themselves until all the nodes are recovered. At each time when there is an event, either infection or recovery, the nodes in red are infected and the others are healthy. The average recovery time is assumed to be 0.8 and the average infection time is assumed to be 1.0. Therefore, the network will tend to completely recover eventually, although this is not guaranteed according to the randomness of both the infection and the recovery processes.

Figure 8. Evolution of infection and recovery