Processing math: 100%
Korn, Uri. 2021. “A Simple Method for Modeling Changes over Time.” Variance 14 (1): 1–13.
• Table 1. Simulation results of various time series methods
• Table 2. Default dummy encodings for a year categorical variable
• Table 3. Initial dummy encodings for a random walk
• Table 4. Example dummy encodings for a random walk on the slope
• Table 5. Example dummy encodings with 25% mean reversion

## Abstract

Properly modeling changes over time is essential for forecasting and important for any model or process with data that span multiple time periods. Despite this, most approaches used are ad hoc or lack a statistical framework for making accurate forecasts.

A method is presented to add time series components within a penalized regression framework so that these models are capable of handling everything a penalized generalized linear model can handle (distributional flexibility and credibility) as well as changes over time. Doing this, a subset of state space model functionality can be incorporated in a more familiar framework. The benefits of state space models in terms of their accuracy and intuitiveness are explained.

The method presented here lends itself well to presentation, which can help with understanding and delivering results. This makes it useful not only for pricing and other models but for improving and streamlining other types of actuarial processes such as reserving and profitability studies.

Accepted: March 12, 2019 EDT

# Appendices

## Appendix A: Simulation code

   library( glmnet )
library( mgcv )
library( compiler )
library( rstan )

# logit functions for later
logit <- function(x) log( x / ( 1 - x ) )
ilogit <- function(x) exp(x) / ( 1 + exp(x) )

# function for creating dummy variables for
implementing random walks
contr.randwalk <- function( n, contrasts = TRUE,
sparse = TRUE, momentum=0, rel.cred=1, stdize=TRUE )
{
if (length(n) <= 1L) {
if (is.numeric(n) && length(n) == 1L && n > 1L)
levels <- seq_len(n)
else stop("not enough degrees of freedom to
define
contrasts")
} else { levels <- n }
levels <- as.character(levels)
if ( sparse ) {
cont <- Matrix( c(0), nrow=length(levels),
ncol=length(levels) - 1, sparse=TRUE )
} else {
cont <- matrix( c(0), nrow=length(levels),
ncol=length(levels) - 1 )
}
for ( i in 2:n ) {
cont[, i - 1] <- ifelse( 1:n < i, 0, ifelse(
rep( momentum, n ) == 1, ( 1:n - i + 1 ), ( 1 -
momentum ^ ( 1:n - i + 1 ) ) / ( 1 - momentum ) ) )
cont[, i - 1] <- cont[, i - 1] - mean( cont[,
i - 1] )
}
if (contrasts) {
colnames(cont) <- levels[-1]
}
# standardize
if ( stdize ) {
for ( i in 1:ncol(cont) ) {
cont[,i] <- cont[,i] / sum( diff( cont[,i] ) ^
2 ) ^ 0.5
}
}

cont <- cont * rel.cred
cont
}

# fit kalman filter
kf.fit <- function( x.obs, prem=rep( 1,
length(x.obs) ), incl=1:length(x.obs) ) {
kf <- cmpfun( function( params, return.values=F
) {
n <- length(x.obs)
x.pred0 <- rep( 0, n )
x.pred <- rep( 0, n )
k <- rep( 0, n )
p0 <- rep( 0, n )
p <- rep( 0, n )
f <- rep( 0, n )
err <- rep( 0, n )

Q.est <- exp( params ) * ilogit( params )
R.est <- exp( params ) * prem * ( 1 -
ilogit( params ) )

for ( i in 1:n ) {
x.pred0[i] <- ifelse( i > 1, x.pred[i - 1],
params )
err[i] <- x.obs[i] - x.pred0[i]
p0[i] <- ifelse( i > 1, p[i - 1] + Q.est, 0 )
f[i] <- p0[i] + R.est / prem[i]
k[i] <- ifelse( ( ! i %in% incl ) | f[i] == 0,
0, p0[i] / f[i] )
p[i] <- p0[i] * ( 1 - k[i] )
x.pred[i] <- x.pred0[i] + k[i] * err[i]
}
loglik <- sum( sapply( 1:n, function(i) log(
max( 1e-100, dnorm( err[i], 0, f[i] ^ 0.5 ) ) ) ) )
if ( return.values ) {
x.smooth <- rep( 0, n )
x.smooth[n] <- x.pred[n]
for ( i in (n - 1):1 ) x.smooth[i] <- x.pred[i]
+ ( p[i] / p0[i + 1] ) * ( x.smooth[i + 1] -
x.pred[i] )
list( loglik=loglik, Q=Q.est, R=R.est,
x=x.smooth, x.pred=x.pred, f=f, k=k, p=p )
} else {
loglik
}
} )
o <- optim( c( log( var(x.obs) / 2 ), -10, sum(
prem * x.obs ) / sum( prem ) ), kf, control=list(
fnscale=-1 ), method='BFGS' )
kf( o$par, return.values=T ) } # calculate root mean square error rmse <- function( x, y, i=1:length(x) ) mean( ( x[i] - y[i] ) ^ 2 ) ^ 0.5 # number of years n <- 10 # innovation variance Q <- 0.002 # volatility R <- 0.003 # serial correlation years.corr <- 0.1 # definitions for no outliers q.large.pct.change <- 0 r.large.pct.change <- 0 q.df.reg <- 250 q.df.large <- 250 r.df.reg <- 250 r.df.large <- 250 q.large.pct.change <- 0 r.large.pct.change <- 0 pct.change <- 0.5 Q.large <- 0 R.large <- 0 # definitions for with outliers q.df.reg <- 6 q.df.large <- 3 r.df.reg <- 12 r.df.large <- 6 q.large.pct.change <- 0.05 r.large.pct.change <- 0.1 pct.change <- 0.5 Q.large <- Q * 5 R.large <- R * 5 num.iter <- 1000 do.graph1 <- FALSE print.cvm1 <- FALSE do.bayes <- TRUE # run the simulation set.seed( 1112 ) results <- NULL for ( iter in 1:num.iter ) { if ( iter %% 10 == 0 ) { print( iter ) } # simulate data q <- rt( n, q.df.reg ) * Q ^ 0.5 # add serial correlation for ( i in 2:n ) { q[i] <- years.corr * q[i - 1] + sqrt( 1 - years.corr ^ 2 ) * q[i] } q <- ifelse( runif( n ) < pct.change, q, rep( 0, n ) ) q.large <- ifelse( runif( n ) < q.large.pct.change, rt( n, q.df.large ) * Q.large ^ 0.5, rep( 0, n ) ) q <- q + q.large q.cuml <- cumsum( q ) a <- 20 * exp(q.cuml) r.large <- ifelse( runif( n ) < r.large.pct.change, rt( n, r.df.large ) * R.large ^ 0.5, rep( 0, n ) ) o <- exp( log( a ) + rt( n, r.df.reg ) * R ^ 0.5 + r.large ) # create the time series variables d <- data.frame( y=1:n, a=a, o=o ) # standardize trend variable d$y.std <- d$y / sqrt( n - 1 ) d$y.std <- d$y.std - mean( d$y.std )
d$y.fac <- factor( d$y )
contrasts( d$y.fac ) <- contr.pen( nrow(d), sparse=TRUE ) d$y.rw <- d$y.fac # set the contrasts to make this variable treated as a random walk, instead of a factor contrasts( d$y.rw ) <- contr.randwalk( nrow(d),
sparse=TRUE )

if ( do.graph1 ) {
plot( d$y, d$o, type='b', xlab='Year', ylab='X',
lwd=2 )
lines( d$y, d$a, type='b', lty=3 )
abline( h=mean( d$o ), col='gray' ) } # function to fit models fit.model <- function( form, num.folds=3, num.times=20, alpha=0.75, fit.incl=1:n, do.graph=do.graph1, print.cvm=print.cvm1, col='blue', lty=1, lwd=1 ) { # create the modeling matrix that describes the independent variables of the model x <- sparse.model.matrix( form, d ) # remove the intercept if ( colnames( x ) == '(Intercept)' ) { x <- x[,-1] } fit <- glmnet( x[fit.incl,], log( d$o[fit.incl]
), family='gaussian', standardize=FALSE,
alpha=alpha ) #, lambda.min.ratio=1e-10 )
fit.cv <- lapply( 1:num.times, function(i)
cv.glmnet( x[fit.incl,], log( d$o[fit.incl] ), family='gaussian', standardize=FALSE, alpha=alpha, nfolds=num.folds, lambda=fit$lambda ) )
lambda <- mean( sapply( fit.cv, function(x)
x$lambda.min ) ) cvm <- mean( sapply( fit.cv, function(x) min( x$cvm ) ) )
p <- exp( predict( fit, newx=x, type='response',
s=lambda ) )
if ( do.graph ) lines( d$y, p, col=col, lwd=lwd, lty=lty ) if ( print.cvm ) print( cvm ) p } d$p.mean <- mean( d$o ) d$p.fac <- fit.model( ~ 0 + y.fac, col='blue',
print.cvm=print.cvm1 )
d$p.rw <- fit.model( ~ y.rw, col='red', lwd=2, print.cvm=print.cvm1 ) # kalman filter d$p.kf <- exp( kf.fit( log( d$o ), incl=1:n )$x )
if ( do.graph1 ) lines( d$y, d$p.kf,
col='green' )

# spline
fit.sp <- gam( log( o ) ~ s( y ), data=d )
d$p.sp <- exp( predict( fit.sp, newdata=d ) ) if ( do.graph1 ) lines( d$y, d$p.sp, col='gold', lwd=2 ) # bayesian model if ( do.bayes ) { # model on log of observed instead of handling in model, so that comparable to other models data.stan <- list( x=log( d$o )
, N=n
, y_rw_scale=5
)
fit.bayes <- stan( 'rw_test.stan',
data=data.stan, chains=3, iter=1000 )
stan.obj <- extract( fit.bayes, permuted=TRUE )
#traceplot( fit.bayes )
#stan_dens( fit.bayes, separate_chains=TRUE )
d$p.bayes <- exp( sapply( 1:n, function(i) mean( stan.obj[['y']][,i] ) ) ) if ( do.graph1 ) lines( d$y, d$p.bayes, col='pink', lwd=2 ) } # calculate prediction errors of each method rmse.pred <- c() for ( f in names(d)[ grep( 'p.', names(d), fixed=TRUE ) ] ) { rmse.pred <- c( rmse.pred, rmse( d[[f]], d$a,
1:n ) )
names( rmse.pred )[length(rmse.pred)] <- f
}
results <- rbind( results, rmse.pred )
}

data.frame( method=colnames( results ),
rmse=sapply( 1:ncol(results), function(i)
mean( results[,i] ) ) )


Save in separate file named “rw_test.stan”:

   data {
int N;
vector<lower=0>[N] x;
real<lower=0> y_rw_scale;
}

parameters {
vector<lower=-2, upper=2>[N - 1] y_rw;
real<lower=0, upper=2> y_sd;
real<lower=0> y_rw_sd;
real y1;
}

transformed parameters {
vector[N] y;

y = y1;
for ( i in 2:N ) {
y[i] = y[i - 1] + y_rw[i - 1];
}
}

model {
y1 ~ uniform( 1, 5 );
y_rw_sd ~ cauchy( 0, y_rw_scale );
y_rw ~ normal( 0, y_rw_sd );
x ~ normal( y, y_sd );
}


## Appendix B: Loss ratio example

   library( HDtweedie ) # elastic net for Tweedie
family
library( Matrix ) # for building sparse matrices
library( ggplot2 )

# function for creating dummy variables for
implementing random walks
contr.randwalk <- function( n, contrasts = TRUE,
sparse = TRUE, momentum=0, rel.cred=1, stdize=TRUE )
{
if (length(n) <= 1L) {
if (is.numeric(n) && length(n) == 1L && n > 1L)
levels <- seq_len(n)
else stop("not enough degrees of freedom to
define contrasts")
} else { levels <- n }
levels <- as.character(levels)
if ( sparse ) {
cont <- Matrix( c(0), nrow=length(levels),
ncol=length(levels) - 1, sparse=TRUE )
} else {
cont <- matrix( c(0), nrow=length(levels),
ncol=length(levels) - 1 )
}
for ( i in 2:n ) {
cont[, i - 1] <- ifelse( 1:n < i, 0, ifelse(
rep( momentum, n ) == 1, ( 1:n - i + 1 ), ( 1 -
momentum ^ ( 1:n - i + 1 ) ) / ( 1 - momentum ) ) )
cont[, i - 1] <- cont[, i - 1] - mean( cont[,
i - 1] )
}
if (contrasts) {
colnames(cont) <- levels[-1]
}
# standardize
if ( stdize ) {
for ( i in 1:ncol(cont) ) {
cont[,i] <- cont[,i] / sum( diff( cont[,i] )
^ 2 ) ^ 0.5
}
}

cont <- cont * rel.cred
cont
}

# --- simulation code ---
num.segs <- 3
num.subsegs.each <- 2
num.subsegs <- num.segs * num.subsegs.each
# for finding the appropriate segment for each
subsegment
seg.map <- c( sapply( 1:num.segs, function(i)
rep( i, num.subsegs.each ) ) )
num.yrs <- 10
years.corr <- 0.2
ldf <- 3 ^ ( 0.65 ^ ( ( num.yrs - 1 ):0 ) )

overall.avg <- log( 700 )
# segment relativities
seg.rel <- rnorm( num.segs, 0, 0.1 )
# subsegment relativies
subseg.rel <- rnorm( num.subsegs, 0, 0.1 )
# random walk coefs
rw.chg.init <- rnorm( num.yrs, 0, 0.2 )
rw.seg.chg.init <- sapply( 1:num.segs,
function(i) rnorm( num.yrs, 0, 0.15 ) )
rw.subseg.chg.init <- sapply( 1:num.subsegs,
function(i) rnorm( num.yrs, 0, 0.1 ) )
# add correlation (momentum) to the changes
rw.chg <- rw.chg.init
rw.seg.chg <- rw.seg.chg.init
rw.subseg.chg <- rw.subseg.chg.init
for ( i in 2:num.yrs ) {
rw.chg[i] <- years.corr * rw.chg.init[i - 1] +
sqrt( 1 - years.corr ^ 2 ) * rw.chg.init[i]
for ( j in 1:num.segs ) rw.seg.chg[i, j] <-
years.corr * rw.seg.chg.init[i - 1, j] + sqrt( 1 -
years.corr ^ 2 ) * rw.seg.chg.init[i, j]
for ( j in 1:num.subsegs ) rw.subseg.chg[i, j]
<- years.corr * rw.subseg.chg.init[i - 1, j] +
sqrt( 1 - years.corr ^ 2 ) *
rw.subseg.chg.init[i, j]
}

# using simulated values, produce the data, with
error. (note this is a crude simulation)
for ( subseg in 1:num.subsegs ) {
seg <- seg.map[subseg]
x <- exp( overall.avg + seg.rel[seg] +
subseg.rel[subseg]
+ cumsum( rw.chg ) + cumsum( rw.seg.chg[,seg] )
+ cumsum( rw.subseg.chg[,subseg] )
+ rnorm( num.yrs, 0, 0.2 * ldf ) ) / ldf
d.add <- data.frame( seg=seg, subseg=subseg,
yr=1:num.yrs, loss=x, ldf=ldf, ep=1000 )
if ( subseg == 1 ) {
} else {
d <- rbind( d, d.add )
}
}
d$seg <- factor( d$seg )
d$subseg <- factor( d$subseg )

# cape cod-like method to develop losses and
weight years
d$ult.lr <- d$loss * d$ldf / d$ep
# this will give the greener years less weight
d$used.ep <- d$ep / d$ldf # --- modeling code --- # create random walk variable # set different values of the momentum (and relative credibility if want) here rw.momentum <- 0 rw.rel.cred <- 1 # rw.momentum <- 0.1 d$yr.rw <- factor( d$yr ) # set the contrasts to make this variable treated as a random walk, instead of a factor contrasts( d$yr.rw ) <- contr.randwalk( num.yrs,
sparse=TRUE, momentum=rw.momentum,
rel.cred=rw.rel.cred )

# create the modeling matrix that describes the
independent variables of the model
# make sure to create columns for each level,
since this is needed for penalized regression models
(see: https://stackoverflow.com/questions/4560459/
all-levels-of-a-factor-in-a-model-matrix-in-r)
x <- sparse.model.matrix( ~ 0 + seg + subseg +
yr.rw + seg:yr.rw + subseg:yr.rw, d, contrasts.arg =
lapply( d[,c('seg', 'subseg')], contrasts, contrasts
= FALSE ) )

# fit the model
# (HDtweedie doesn't support sparse matrices
unlist glmnet, so need to convert back to unsparse
matrix)
fit <- HDtweedie( as.matrix( x ), d$ult.lr, weights=d$used.ep, p=1.9, alpha=0.75,
standardize=FALSE, lambda.factor=1e-3 )
# cross validate to get the optimal penalty
parameter
set.seed( 1112 )
fit.xval <- cv.HDtweedie( as.matrix( x ),
d$ult.lr, weights=d$used.ep, p=1.9, alpha=0.75,
standardize=FALSE, lambda.factor=1e-3, nfolds=5 )
# cross validated deviance (error) of the model -
use this to test the momentum parameter
min( fit.xval$cvm ) lambda <- fit.xval$lambda.min
# check that the chosen penalty isn't at the ends
which( lambda == fit.xval$lambda ) length( fit.xval$lambda )

# make predictions
d$ult.lr.pred <- predict( fit, as.matrix( x ), type='response', s=lambda )[,1] # --- graph results --- # one at a time par( mfrow=c( 3, 2 ) ) for ( subseg in 1:num.subsegs ) { plot( 1:num.yrs, d$ult.lr[d$seg == seg.map[subseg] & d$subseg == subseg], type='b',
xlab='Year', ylab='LR'
, main=paste( 'Segment: ', seg.map[subseg], ',
Subsegment: ', subseg, sep='' ) )
lines( 1:num.yrs, d$ult.lr.pred[d$seg ==
seg.map[subseg] & d$subseg == subseg], col='blue' ) } par( mfrow=c( 1, 1 ) ) d.plot <- rbind( data.frame( Year=d$yr,
seg=d$seg, subseg=d$subseg, Segment=paste( d$seg, d$subseg ), LR=d$ult.lr, Line='Actual' ), data.frame( Year=d$yr, seg=d$seg, subseg=d$subseg,
Segment=paste( d$seg, d$subseg ), LR=d$ult.lr.pred, Line='Predicted' ) ) # each segment together seg <- 1 ggplot( d.plot[d.plot$seg == seg,], aes( Year,
LR ) ) + geom_line( aes( color=Segment,
linetype=Line ) )

# all together
ggplot( d.plot, aes( Year, LR ) ) + geom_line(
aes( color=Segment, linetype=Line ) )


1. These data were the result of a single simulation using the first method discussed in Section 3.2. All of these models were fit using an elastic net with threefold cross-validation repeated 20 times.

2. The results of the Holt-Winters method, also known as exponential smoothing, should be roughly similar to but less accurate than those of the Kalman filter and will not be expounded upon here.

3. The same is true when using the “bagging” method described in Korn (2016).

4. The model was fit to the first 10 data points and predicted on all 13.

5. For the first reason, it is suggested that more weight be given to the lasso penalty.

6. Note that the issues mentioned above regarding additive models do not apply here since periods are repeated multiple times.

7. Note that twice this amount is not used, similar to how numerical variables are divided by twice their standard deviation, since dividing by this divisor already puts the variables on the same scale as a plain random walk. In contrast, it is necessary to divide numerical variables by twice their standard deviation to put them on the same scale as categorical variables.