Processing math: 100%
Korn, Uri. 2021. “A Simple Method for Modeling Changes over Time.” Variance 14 (1): 1–13.
Download all (16)
  • Figure 1. Time as a categorical variable versus RSSM
  • Figure 2. Bayesian model versus RSSM
  • Figure 3. Additive model fit on the example data
  • Figure 4. Issues with the additive model fit
  • Table 1. Simulation results of various time series methods
  • Table 2. Default dummy encodings for a year categorical variable
  • Table 3. Initial dummy encodings for a random walk
  • Figure 5. Elastic net and ridge regression comparison
  • Figure 6. Comparison of a large change
  • Figure 7. A segment moving away from the mean
  • Figure 8. A segment moving toward the mean
  • Table 4. Example dummy encodings for a random walk on the slope
  • Table 5. Example dummy encodings with 25% mean reversion
  • Figure 9. Random walk with momentum
  • Figure 10. Trended, ultimate loss ratios by segment
  • Figure 11. Fitted trended, ultimate loss ratios by segment

Abstract

Properly modeling changes over time is essential for forecasting and important for any model or process with data that span multiple time periods. Despite this, most approaches used are ad hoc or lack a statistical framework for making accurate forecasts.

A method is presented to add time series components within a penalized regression framework so that these models are capable of handling everything a penalized generalized linear model can handle (distributional flexibility and credibility) as well as changes over time. Doing this, a subset of state space model functionality can be incorporated in a more familiar framework. The benefits of state space models in terms of their accuracy and intuitiveness are explained.

The method presented here lends itself well to presentation, which can help with understanding and delivering results. This makes it useful not only for pricing and other models but for improving and streamlining other types of actuarial processes such as reserving and profitability studies.

Accepted: March 12, 2019 EDT

Appendices

Appendix A: Simulation code

   library( glmnet )
   library( mgcv )
   library( compiler )
   library( rstan )
   
   # logit functions for later
   logit <- function(x) log( x / ( 1 - x ) )
   ilogit <- function(x) exp(x) / ( 1 + exp(x) )
   
   # function for creating dummy variables for 
implementing random walks
   contr.randwalk <- function( n, contrasts = TRUE, 
sparse = TRUE, momentum=0, rel.cred=1, stdize=TRUE ) 
{
    if (length(n) <= 1L) {
    if (is.numeric(n) && length(n) == 1L && n > 1L)
    levels <- seq_len(n)
    else stop("not enough degrees of freedom to 
define 
contrasts")
    } else { levels <- n }
    levels <- as.character(levels)
    if ( sparse ) {
    cont <- Matrix( c(0), nrow=length(levels), 
ncol=length(levels) - 1, sparse=TRUE )
    } else {
    cont <- matrix( c(0), nrow=length(levels), 
ncol=length(levels) - 1 )
    }
    for ( i in 2:n ) {
    cont[, i - 1] <- ifelse( 1:n < i, 0, ifelse( 
rep( momentum, n ) == 1, ( 1:n - i + 1 ), ( 1 - 
momentum ^ ( 1:n - i + 1 ) ) / ( 1 - momentum ) ) )
    cont[, i - 1] <- cont[, i - 1] - mean( cont[, 
i - 1] )
    }
    if (contrasts) {
    colnames(cont) <- levels[-1]
    }
    # standardize
    if ( stdize ) {
    for ( i in 1:ncol(cont) ) {
    cont[,i] <- cont[,i] / sum( diff( cont[,i] ) ^ 
2 ) ^ 0.5
    }
    }
   
    cont <- cont * rel.cred
    cont
   }
   
   # fit kalman filter
   kf.fit <- function( x.obs, prem=rep( 1, 
length(x.obs) ), incl=1:length(x.obs) ) {
    kf <- cmpfun( function( params, return.values=F 
) {
    n <- length(x.obs)
    x.pred0 <- rep( 0, n )
    x.pred <- rep( 0, n )
    k <- rep( 0, n )
    p0 <- rep( 0, n )
    p <- rep( 0, n )
    f <- rep( 0, n )
    err <- rep( 0, n )
   
    Q.est <- exp( params[1] ) * ilogit( params[2] )
    R.est <- exp( params[1] ) * prem[1] * ( 1 - 
ilogit( params[2] ) )
   
    for ( i in 1:n ) {
    x.pred0[i] <- ifelse( i > 1, x.pred[i - 1], 
params[3] )
    err[i] <- x.obs[i] - x.pred0[i]
    p0[i] <- ifelse( i > 1, p[i - 1] + Q.est, 0 )
    f[i] <- p0[i] + R.est / prem[i]
    k[i] <- ifelse( ( ! i %in% incl ) | f[i] == 0, 
0, p0[i] / f[i] )
    p[i] <- p0[i] * ( 1 - k[i] )
    x.pred[i] <- x.pred0[i] + k[i] * err[i]
    }
    loglik <- sum( sapply( 1:n, function(i) log( 
max( 1e-100, dnorm( err[i], 0, f[i] ^ 0.5 ) ) ) ) )
    if ( return.values ) {
    x.smooth <- rep( 0, n )
    x.smooth[n] <- x.pred[n]
    for ( i in (n - 1):1 ) x.smooth[i] <- x.pred[i] 
+ ( p[i] / p0[i + 1] ) * ( x.smooth[i + 1] - 
x.pred[i] )
    list( loglik=loglik, Q=Q.est, R=R.est, 
x=x.smooth, x.pred=x.pred, f=f, k=k, p=p )
    } else {
    loglik
    }
    } )
    o <- optim( c( log( var(x.obs) / 2 ), -10, sum( 
prem * x.obs ) / sum( prem ) ), kf, control=list( 
fnscale=-1 ), method='BFGS' )
    kf( o$par, return.values=T )
   }
   
   # calculate root mean square error
   rmse <- function( x, y, i=1:length(x) ) mean( ( 
x[i] - y[i] ) ^ 2 ) ^ 0.5
   
   # number of years
   n <- 10
   # innovation variance
   Q <- 0.002
   # volatility
   R <- 0.003
   # serial correlation
   years.corr <- 0.1
   
   # definitions for no outliers
   q.large.pct.change <- 0
   r.large.pct.change <- 0
   q.df.reg <- 250
   q.df.large <- 250
   r.df.reg <- 250
   r.df.large <- 250
   q.large.pct.change <- 0
   r.large.pct.change <- 0
   pct.change <- 0.5
   Q.large <- 0
   R.large <- 0
   
   # definitions for with outliers
   q.df.reg <- 6
   q.df.large <- 3
   r.df.reg <- 12
   r.df.large <- 6
   q.large.pct.change <- 0.05
   r.large.pct.change <- 0.1
   pct.change <- 0.5
   Q.large <- Q * 5
   R.large <- R * 5
   
   num.iter <- 1000
   
   do.graph1 <- FALSE
   print.cvm1 <- FALSE
   do.bayes <- TRUE
   # run the simulation
   set.seed( 1112 )
   results <- NULL
   for ( iter in 1:num.iter ) {
    if ( iter %% 10 == 0 ) {
    print( iter )
    }
   
    # simulate data
    q <- rt( n, q.df.reg ) * Q ^ 0.5
    # add serial correlation
    for ( i in 2:n ) {
    q[i] <- years.corr * q[i - 1] + sqrt( 1 - 
years.corr ^ 2 ) * q[i]
    }
    q <- ifelse( runif( n ) < pct.change, q, rep( 0, 
n ) )
    q.large <- ifelse( runif( n ) < 
q.large.pct.change, rt( n, q.df.large ) * Q.large ^ 
0.5, rep( 0, n ) )
    q <- q + q.large
    q.cuml <- cumsum( q )
    a <- 20 * exp(q.cuml)
    r.large <- ifelse( runif( n ) < 
r.large.pct.change, rt( n, r.df.large ) * R.large ^ 
0.5, rep( 0, n ) )
    o <- exp( log( a ) + rt( n, r.df.reg ) * R ^ 
0.5 + r.large )
   
    # create the time series variables
    d <- data.frame( y=1:n, a=a, o=o )
    # standardize trend variable
    d$y.std <- d$y / sqrt( n - 1 )
    d$y.std <- d$y.std - mean( d$y.std )
    d$y.fac <- factor( d$y )
    contrasts( d$y.fac ) <- contr.pen( nrow(d), 
sparse=TRUE )
    d$y.rw <- d$y.fac
    # set the contrasts to make this variable 
treated as a random walk, instead of a factor
    contrasts( d$y.rw ) <- contr.randwalk( nrow(d), 
sparse=TRUE )
   
    if ( do.graph1 ) {
    plot( d$y, d$o, type='b', xlab='Year', ylab='X', 
lwd=2 )
    lines( d$y, d$a, type='b', lty=3 )
    abline( h=mean( d$o ), col='gray' )
    }
   
    # function to fit models
    fit.model <- function( form, num.folds=3, 
num.times=20, alpha=0.75, fit.incl=1:n, 
do.graph=do.graph1, print.cvm=print.cvm1, 
col='blue', lty=1, lwd=1 ) {
    # create the modeling matrix that describes the 
independent variables of the model
    x <- sparse.model.matrix( form, d )
    # remove the intercept
    if ( colnames( x )[1] == '(Intercept)' ) {
    x <- x[,-1]
    }
    fit <- glmnet( x[fit.incl,], log( d$o[fit.incl] 
), family='gaussian', standardize=FALSE, 
alpha=alpha ) #, lambda.min.ratio=1e-10 )
    fit.cv <- lapply( 1:num.times, function(i) 
cv.glmnet( x[fit.incl,], log( d$o[fit.incl] ), 
family='gaussian', standardize=FALSE, alpha=alpha, 
nfolds=num.folds, lambda=fit$lambda ) )
    lambda <- mean( sapply( fit.cv, function(x) 
x$lambda.min ) )
    cvm <- mean( sapply( fit.cv, function(x) min( 
x$cvm ) ) )
    p <- exp( predict( fit, newx=x, type='response', 
s=lambda ) )
    if ( do.graph ) lines( d$y, p, col=col, lwd=lwd, 
lty=lty )
    if ( print.cvm ) print( cvm )
    p
    }
   
    d$p.mean <- mean( d$o )
    d$p.fac <- fit.model( ~ 0 + y.fac, col='blue', 
print.cvm=print.cvm1 )
    d$p.rw <- fit.model( ~ y.rw, col='red', lwd=2, 
print.cvm=print.cvm1 )
   
    # kalman filter
    d$p.kf <- exp( kf.fit( log( d$o ), incl=1:n 
)$x )
    if ( do.graph1 ) lines( d$y, d$p.kf, 
col='green' )
   
    # spline
    fit.sp <- gam( log( o ) ~ s( y ), data=d )
    d$p.sp <- exp( predict( fit.sp, newdata=d ) )
    if ( do.graph1 ) lines( d$y, d$p.sp, col='gold', 
lwd=2 )
   
    # bayesian model
    if ( do.bayes ) {
    # model on log of observed instead of handling 
in model, so that comparable to other models
    data.stan <- list(
    x=log( d$o )
    , N=n
    , y_rw_scale=5
    )
    fit.bayes <- stan( 'rw_test.stan', 
data=data.stan, chains=3, iter=1000 )
    stan.obj <- extract( fit.bayes, permuted=TRUE )
    #traceplot( fit.bayes )
    #stan_dens( fit.bayes, separate_chains=TRUE )
    d$p.bayes <- exp( sapply( 1:n, function(i) mean( 
stan.obj[['y']][,i] ) ) )
    if ( do.graph1 ) lines( d$y, d$p.bayes, 
col='pink', lwd=2 )
    }
   
    # calculate prediction errors of each method
    rmse.pred <- c()
    for ( f in names(d)[ grep( 'p.', names(d), 
fixed=TRUE ) ] ) {
    rmse.pred <- c( rmse.pred, rmse( d[[f]], d$a, 
1:n ) )
    names( rmse.pred )[length(rmse.pred)] <- f
    }
    results <- rbind( results, rmse.pred )
   }
   
   data.frame( method=colnames( results ), 
rmse=sapply( 1:ncol(results), function(i) 
mean( results[,i] ) ) )

Save in separate file named “rw_test.stan”:

   data {
    int N;
    vector<lower=0>[N] x;
    real<lower=0> y_rw_scale;
   }
   
   parameters {
    vector<lower=-2, upper=2>[N - 1] y_rw;
    real<lower=0, upper=2> y_sd;
    real<lower=0> y_rw_sd;
    real y1;
   }
   
   transformed parameters {
    vector[N] y;
   
    y[1] = y1;
    for ( i in 2:N ) {
    y[i] = y[i - 1] + y_rw[i - 1];
    }
   }
   
   
   model {
    y1 ~ uniform( 1, 5 );
    y_rw_sd ~ cauchy( 0, y_rw_scale );
    y_rw ~ normal( 0, y_rw_sd );
    x ~ normal( y, y_sd );
   }

Appendix B: Loss ratio example

   library( HDtweedie ) # elastic net for Tweedie 
family
   library( Matrix ) # for building sparse matrices
   library( ggplot2 )
   
   # function for creating dummy variables for 
implementing random walks
   contr.randwalk <- function( n, contrasts = TRUE, 
sparse = TRUE, momentum=0, rel.cred=1, stdize=TRUE ) 
{
    if (length(n) <= 1L) {
    if (is.numeric(n) && length(n) == 1L && n > 1L)
    levels <- seq_len(n)
    else stop("not enough degrees of freedom to 
define contrasts")
    } else { levels <- n }
    levels <- as.character(levels)
    if ( sparse ) {
    cont <- Matrix( c(0), nrow=length(levels), 
ncol=length(levels) - 1, sparse=TRUE )
    } else {
    cont <- matrix( c(0), nrow=length(levels), 
ncol=length(levels) - 1 )
    }
    for ( i in 2:n ) {
    cont[, i - 1] <- ifelse( 1:n < i, 0, ifelse( 
rep( momentum, n ) == 1, ( 1:n - i + 1 ), ( 1 - 
momentum ^ ( 1:n - i + 1 ) ) / ( 1 - momentum ) ) )
    cont[, i - 1] <- cont[, i - 1] - mean( cont[, 
i - 1] )
    }
    if (contrasts) {
    colnames(cont) <- levels[-1]
    }
    # standardize
    if ( stdize ) {
    for ( i in 1:ncol(cont) ) {
    cont[,i] <- cont[,i] / sum( diff( cont[,i] ) 
^ 2 ) ^ 0.5
    }
    }
   
    cont <- cont * rel.cred
    cont
   }
   
   # --- simulation code ---
   num.segs <- 3
   num.subsegs.each <- 2
   num.subsegs <- num.segs * num.subsegs.each
   # for finding the appropriate segment for each 
subsegment
   seg.map <- c( sapply( 1:num.segs, function(i) 
rep( i, num.subsegs.each ) ) )
   num.yrs <- 10
   years.corr <- 0.2
   ldf <- 3 ^ ( 0.65 ^ ( ( num.yrs - 1 ):0 ) )
   
   overall.avg <- log( 700 )
   # segment relativities
   seg.rel <- rnorm( num.segs, 0, 0.1 )
   # subsegment relativies
   subseg.rel <- rnorm( num.subsegs, 0, 0.1 )
   # random walk coefs
   rw.chg.init <- rnorm( num.yrs, 0, 0.2 )
   rw.seg.chg.init <- sapply( 1:num.segs, 
function(i) rnorm( num.yrs, 0, 0.15 ) )
   rw.subseg.chg.init <- sapply( 1:num.subsegs, 
function(i) rnorm( num.yrs, 0, 0.1 ) )
   # add correlation (momentum) to the changes
   rw.chg <- rw.chg.init
   rw.seg.chg <- rw.seg.chg.init
   rw.subseg.chg <- rw.subseg.chg.init
   for ( i in 2:num.yrs ) {
    rw.chg[i] <- years.corr * rw.chg.init[i - 1] + 
sqrt( 1 - years.corr ^ 2 ) * rw.chg.init[i]
    for ( j in 1:num.segs ) rw.seg.chg[i, j] <- 
years.corr * rw.seg.chg.init[i - 1, j] + sqrt( 1 - 
years.corr ^ 2 ) * rw.seg.chg.init[i, j]
    for ( j in 1:num.subsegs ) rw.subseg.chg[i, j] 
<- years.corr * rw.subseg.chg.init[i - 1, j] + 
sqrt( 1 - years.corr ^ 2 ) * 
rw.subseg.chg.init[i, j]
   }
   
   # using simulated values, produce the data, with 
error. (note this is a crude simulation)
   for ( subseg in 1:num.subsegs ) {
    seg <- seg.map[subseg]
    x <- exp( overall.avg + seg.rel[seg] + 
subseg.rel[subseg]
    + cumsum( rw.chg ) + cumsum( rw.seg.chg[,seg] ) 
+ cumsum( rw.subseg.chg[,subseg] )
    + rnorm( num.yrs, 0, 0.2 * ldf ) ) / ldf
    d.add <- data.frame( seg=seg, subseg=subseg, 
yr=1:num.yrs, loss=x, ldf=ldf, ep=1000 )
    if ( subseg == 1 ) {
    d <- d.add
    } else {
    d <- rbind( d, d.add )
    }
   }
   d$seg <- factor( d$seg )
   d$subseg <- factor( d$subseg )
   
   # cape cod-like method to develop losses and 
weight years
   d$ult.lr <- d$loss * d$ldf / d$ep
   # this will give the greener years less weight
   d$used.ep <- d$ep / d$ldf
   
   # --- modeling code ---
   # create random walk variable
   # set different values of the momentum (and 
relative credibility if want) here
   rw.momentum <- 0
   rw.rel.cred <- 1
   # rw.momentum <- 0.1
   d$yr.rw <- factor( d$yr )
   # set the contrasts to make this variable treated 
as a random walk, instead of a factor
   contrasts( d$yr.rw ) <- contr.randwalk( num.yrs, 
sparse=TRUE, momentum=rw.momentum, 
rel.cred=rw.rel.cred )
   
   # create the modeling matrix that describes the 
independent variables of the model
   # make sure to create columns for each level, 
since this is needed for penalized regression models 
(see: https://stackoverflow.com/questions/4560459/
all-levels-of-a-factor-in-a-model-matrix-in-r)
   x <- sparse.model.matrix( ~ 0 + seg + subseg + 
yr.rw + seg:yr.rw + subseg:yr.rw, d, contrasts.arg = 
lapply( d[,c('seg', 'subseg')], contrasts, contrasts 
= FALSE ) )
   
   # fit the model
   # (HDtweedie doesn't support sparse matrices 
unlist glmnet, so need to convert back to unsparse 
matrix)
   fit <- HDtweedie( as.matrix( x ), d$ult.lr, 
weights=d$used.ep, p=1.9, alpha=0.75, 
standardize=FALSE, lambda.factor=1e-3 )
   # cross validate to get the optimal penalty 
parameter
   set.seed( 1112 )
   fit.xval <- cv.HDtweedie( as.matrix( x ), 
d$ult.lr, weights=d$used.ep, p=1.9, alpha=0.75, 
standardize=FALSE, lambda.factor=1e-3, nfolds=5 )
   # cross validated deviance (error) of the model - 
use this to test the momentum parameter
   min( fit.xval$cvm )
   lambda <- fit.xval$lambda.min
   # check that the chosen penalty isn't at the ends
   which( lambda == fit.xval$lambda )
   length( fit.xval$lambda )
   
   # make predictions
   d$ult.lr.pred <- predict( fit, as.matrix( x ), 
type='response', s=lambda )[,1]
   
   # --- graph results ---
   # one at a time
   par( mfrow=c( 3, 2 ) )
   for ( subseg in 1:num.subsegs ) {
    plot( 1:num.yrs, d$ult.lr[d$seg == 
seg.map[subseg] & d$subseg == subseg], type='b', 
xlab='Year', ylab='LR'
    , main=paste( 'Segment: ', seg.map[subseg], ', 
Subsegment: ', subseg, sep='' ) )
    lines( 1:num.yrs, d$ult.lr.pred[d$seg == 
seg.map[subseg] & d$subseg == subseg], col='blue' )
   }
   par( mfrow=c( 1, 1 ) )
   
   
   d.plot <- rbind( data.frame( Year=d$yr, 
seg=d$seg, subseg=d$subseg, Segment=paste( d$seg, 
d$subseg ), LR=d$ult.lr, Line='Actual' ), 
data.frame( Year=d$yr, seg=d$seg, subseg=d$subseg, 
Segment=paste( d$seg, d$subseg ), LR=d$ult.lr.pred, 
Line='Predicted' ) )
   
   # each segment together
   seg <- 1
   ggplot( d.plot[d.plot$seg == seg,], aes( Year, 
LR ) ) + geom_line( aes( color=Segment, 
linetype=Line ) )
   
   # all together
   ggplot( d.plot, aes( Year, LR ) ) + geom_line( 
aes( color=Segment, linetype=Line ) )

  1. These data were the result of a single simulation using the first method discussed in Section 3.2. All of these models were fit using an elastic net with threefold cross-validation repeated 20 times.

  2. The results of the Holt-Winters method, also known as exponential smoothing, should be roughly similar to but less accurate than those of the Kalman filter and will not be expounded upon here.

  3. The same is true when using the “bagging” method described in Korn (2016).

  4. The model was fit to the first 10 data points and predicted on all 13.

  5. For the first reason, it is suggested that more weight be given to the lasso penalty.

  6. Note that the issues mentioned above regarding additive models do not apply here since periods are repeated multiple times.

  7. Note that twice this amount is not used, similar to how numerical variables are divided by twice their standard deviation, since dividing by this divisor already puts the variables on the same scale as a plain random walk. In contrast, it is necessary to divide numerical variables by twice their standard deviation to put them on the same scale as categorical variables.