denim

an R package for deterministic compartmental models with flexible dwell time distributions


Authors: Thinh Ong, Anh Phan, Lam Ha, Marc Choisy
Presenter: Anh Phan

Background

For mathematical simplicity, compartmental models are typically formulated as a system of ordinary differential equations (ODE) , where the rate of transitioning from one compartment to another is either:

  • dependent on the simulation time (force of infection, seasonal forcing, etc.)

  • or constant

    • Implies exponentially distributed dwell time

Background

Consider the classical SIR model

\[ \begin{cases} \frac{dS}{dt} = - \frac{\beta I}{N}S \\ \frac{dI}{dt} = \frac{\beta I}{N}S - \gamma I\\ \frac{dR}{dt} = \gamma I \\ \end{cases} \]

When \(\gamma = 1/7\), it implies an exponentially distributed infectious period where rate = 1/7

Background

When \(\gamma = 1/7\), it implies an exponentially distributed infectious period where rate = 1/7

Background

While a more realistic distribution looks like the following

Background

The distribution shape of the infectious period can greatly affect the model’s trajectory (Wearing, Rohani, and Keeling 2005; Keeling and Grenfell 1998; Kenah and Miller 2011).

Wrong assumptions for distribution of incubation/infectious period may lead to inaccurate estimations for important measures (e.g., disease persistency, basic reproductive ratio, etc.)

Several methods to relax the exponential distribution assumption were proposed.

Background

The simplest one being Linear Chain Trick (LCT) to model Erlang distributed infectious period. The general idea being:

  • Infectious period (\(I\) compartment) is divided into sub-compartments, the number of which is the shape parameter of the Erlang distribution.

  • The rate of transitioning from one sub-compartment to another is the rate parameter of the Erlang distribution.

For example: ODE formulation for (\(T_I \sim E_{3}(0.2)\))

\[ \begin{cases} dS = -\beta \frac{I_1 + I_2 + I_3}{N} S \\ dI_1 = \beta \frac{I_1 + I_2 + I_3}{N} S - 0.2I_1 \\ dI_2 = 0.2I_1 - 0.2I_2 \\ dI_3 = 0.2I_2 - 0.2I_3 \\ dR = 0.2I_3 \\ \end{cases} \]

Background

However:

  • Formulation can get complex as shape parameter grows

  • There are still many instances when this does not provide a great fit to the data

Background

For arbitrarily distributed infectious period, formulation for compartmental model involves a system of delay integro-differential equation (DIDE).

Example:

\[ \begin{cases} \frac{dS(t)}{dt} = - N_{S \rightarrow I}(t) \\ \frac{dI(t)}{dt} = N_{S \rightarrow I}(t) - \int_0^{t} f(\tau) N_{S \rightarrow I}(t - \tau) d \tau \\ \frac{dR(t)}{dt} = \int_0^{t} f(\tau) N_{S \rightarrow I}(t - \tau) d \tau \\ \end{cases} \]

Where:

  • \(N_{S \rightarrow I}(t)\) is the number of individuals going from \(S\) to \(I\) at time \(t\), typically defined as \(N_{S \rightarrow I}(t) = \beta \frac{I(t)}{N}S(t)\)

  • \(f(\tau)\) is the probability distribution of infectious period

Background

However, there is no solver for delay integro-differential equations

To manually implement a solver would lead to:

  • Time consuming process

  • Complex code

Our work denim:

  • Propose a numerical approach to simulate compartmental model with arbitrarily distributed dwell-time

  • Provide an R package that implements the proposed approach

Proposed algorithm

Proposed algorithm

The central idea of our algorithm is to divide a compartment into a chain of sub-compartments, from which the out-going population is computed.

To demonstrate this idea, consider an SIR model where the time spent in \(I\) compartment follows a discrete distribution \(P_I = \{p_1, p_2, ..., p_n\}\)

We would divide \(I\) into \(n\) sub-compartments denoted \(I_k\), \(k \in \{1, 2, .., n\}\) where \(I_k\) represents individuals that have stayed in \(I\) for \(k\) time steps.

Proposed algorithm

During each time step, a proportion \(q_k\) of sub-compartment \(I_k\) will transition to \(R\) while the remaining \(1 - q_k\) will transition to sub-compartment \(I_{k+1}\)

Proposed algorithm

Our goal is to derive a formulation for \(q_k\) such that the proportion of individuals in \(I\) leaving from each sub-compartment \(I_k\) follows the distribution \(P_I\).

Proposed algorithm

Our goal is to derive a formulation for \(q_k\) such that the proportion of individuals in \(I\) leaving from each sub-compartment \(I_k\) follows the distribution \(P_I\).

The formulation for \(q_k\) that satisfies said requirement is as followed

\[ q_k = \frac{p_k}{1 - \sum_{j=1}^{k-1} p_j} \]

Proposed algorithm

We can also derive \(q_k\) from a survival analysis perspective.

Consider a survival analysis problem where the event of interest is “transitioning from \(I\) to \(R\)

  • \(p_k = \int_{(k-1) \Delta t}^{ k \Delta t } f(\tau) d \tau\) is the discrete estimation of the probability distribution

  • \(I_k\) represents the population who have “survived” till time point \(k \Delta t\)

  • \(q_k\) - the instantaneous probability the event happen, conditioning on survival till time step \(k\) is thus the hazard rate, usually given by \(h(\tau) = \frac{f(\tau)}{S(\tau)} = \frac{f(\tau)}{1 -\int_0^{\tau} f(\tau)}\)

    • The estimation for \(q_k\) given \(p_k\) is thus \(q_k = \frac{\int_{(k-1) \Delta t}^{k \Delta t} f(\tau)}{1 - \int_{0}^{(k-1) \Delta t} f(\tau)} \approx \frac{p_k}{1 - \sum_{j=1}^{k-1} p_j}\)

Proposed algorithm

The algorithm can also be intepretted as the Euler method to solve the following system of delay integro-differential equation

This is a reformulation of the equation from the Background section, written in terms of \(h(\tau)\) instead of \(f(\tau)\)

\[ \begin{cases} \frac{dS(t)}{dt} = - N_{S \rightarrow I}(t) \\ \frac{dI(t)}{dt} = N_{S \rightarrow I}(t) - \int_0^{\mathcal{T}} h(\tau) I(t, \tau) d \tau \\ \frac{dR(t)}{dt} = \int_0^{\mathcal{T}} h(\tau) I(t, \tau) d \tau \\ \end{cases} \]

Where:

  • \(N_{S \rightarrow I}(t)\) is the number of individuals going from \(S\) to \(I\) at time \(t\), \(N_{S \rightarrow I}(t) = \beta \frac{I(t)}{N}S(t)\).

  • \(I(t, \tau)\) is sub-population of \(I\) at time \(t\) that have stayed for \(\tau\) period of time (i.e., \(I(t, \tau) = N_{S \rightarrow I}(t-\tau)S(\tau)\) where \(S(\tau)\) is survival function).

denim package demo

denim package

In denim, a model structure is defined using denim’s domain specific language (DSL) where each line of code must be a transition, in the following syntax.

from -> to = [transition]

Where [transition] can be either:

  • a mathematical equation for differential equation

  • or, built-in distribution to specify the distribution of dwell time

denim package

Example: the classic SIR model

library(denim)

# ---- Model specification
simple_SIR <- denim_dsl({
  S -> I = beta * S * I/N
  I -> R = d_exponential(gamma)
})

denim package

Example: the classic SIR model

library(denim)

# ---- Model specification
simple_SIR <- denim_dsl({
  S -> I = beta * S * I/N
  I -> R = d_exponential(gamma)
})

# ---- Define parameters and initial values
init_vals <- c(S = 999, I = 1, R = 0)
parameters <- c(beta = 0.8, gamma = 1/7, N = 1000)

# ---- Run simulation
out <- sim(
  simple_SIR,
  parameters = parameters,
  initialValues = init_vals,
  simulationDuration = 30,
  timeStep = 0.1 
)

denim package

Example: the classic SIR model

head(out, n=10)
   Time        S        I          R
1   0.0 999.0000 1.000000 0.00000000
2   0.1 998.9201 1.065736 0.01418416
3   0.2 998.8349 1.135786 0.02930072
4   0.3 998.7442 1.210433 0.04541089
5   0.4 998.6474 1.289977 0.06257986
6   0.5 998.5444 1.374738 0.08087710
7   0.6 998.4346 1.465058 0.10037660
8   0.7 998.3175 1.561298 0.12115722
9   0.8 998.1929 1.663846 0.14330292
10  0.9 998.0600 1.773113 0.16690318

denim package

Example: the classic SIR model

plot(out)

denim package

Currently supported parametric distributions

Function Parameters
d_exponential() rate
d_gamma() rate, shape
d_lognormal() mu, sigma
d_weibull() scale, shape

denim also accept a vector of numeric values as a distribution via nonparametric() function, where each value correspond to 1 time step.

denim package

Comparison between model definition in denim and deSolve, where infectious period is Erlang distributed

library(deSolve)
# --- Model definition in deSolve
transition_func <- function(t, state, param){
  with(as.list( c(state, param) ), {
      dS = -beta*S*(I1 + I2 + I3)/N
      dI1 = beta*S*(I1 + I2 + I3)/N - rate*I1
      dI2 = rate*I1 - rate*I2
      dI3 = rate*I2 - rate*I3
      dR = rate*I3
      list(c(dS, dI1, dI2, dI3, dR))
  })
}
library(denim)
# --- Model definition in denim
denim_mod <- denim_dsl({
  S -> I = beta*S*(I/N)
  I -> R = d_gamma(rate = rate, shape = shape)
})

denim package

library(deSolve)
# --- Model definition in deSolve
transition_func <- function(t, state, param){
  with(as.list( c(state, param) ), {
      dS = -beta*S*(I1 + I2 + I3)/N
      dI1 = beta*S*(I1 + I2 + I3)/N - rate*I1
      dI2 = rate*I1 - rate*I2
      dI3 = rate*I2 - rate*I3
      dR = rate*I3
      list(c(dS, dI1, dI2, dI3, dR))
  })
}

# ---- Model configuration 
parameters <- c(beta = 0.3, rate = 1/3, N = 1000) 
initialValues <- c(S = 999, I1 = 1, I2=0, I3=0, R=0)

# ---- Run simulation
times <- seq(0, 75) # simulation duration
ode_mod <- ode(
    y = initialValues,
    times = times,
    parms = parameters,
    func = transition_func
  )

# --- Format output
ode_mod <- as.data.frame(ode_mod)
ode_mod$I <- rowSums(ode_mod[, c("I1", "I2", "I3")])
library(denim)
# --- Model definition in denim
denim_mod <- denim_dsl({
  S -> I = beta*S*(I/N)
  I -> R = d_gamma(rate = rate, shape = shape)
})

# ---- Model configuration 
init_vals <- c(S = 999, I = 1, R = 0)
parameters <- c(beta = 0.3, N = 1000, rate = 1/3, shape = 3)

# ---- Run simulation
denim_out <- sim(
  denim_mod, 
  initialValues = init_vals,
  parameters = parameters,
  simulationDuration = 75,
  timeStep = 0.05
)

denim package

# ---- Show output
plot_sir(ode_mod, pkg = "deSolve", time_col = "time")

# ---- Show output
plot_sir(denim_out)

denim package

denim also support various ways to handle multiple out-going transitions depending on what information modelers have.

For example, consider a simple SIRD model, the information we can have includes:

  1. Distribution of the time individuals stay in compartment \(I\), denoted \(f_I(\tau)\)
  2. Distributions of the time individuals move to compartment \(R\) and \(D\); denoted \(f_{I \rightarrow R}(\tau)\) and \(f_{I \rightarrow D}(\tau)\) respectively.
  3. The proportion of individuals in \(I\) that end up in \(R\) and \(D\), denoted \(w_R\) and \(w_D\) respectively

denim package

When we only have \(f_{I \rightarrow R}(\tau)\) and \(f_{I \rightarrow D}(\tau)\): \(I \rightarrow R\) and \(I \rightarrow D\) can be modeled as independent competing risks

Under the assumption of independence, the overall hazard rate is the sum of hazard rates of the individual risks

denim package

In terms of sub-compartmental structure, the total proportion of people leaving each sub-compartment \(I_k\) is the sum of probabilities of leaving due to each risk.

denim package

Example: SIRD model with \(I \rightarrow R\) and \(I \rightarrow D\) as competing risks

sird_cr <- denim_dsl({
  S -> I = beta*(I/N)*S
  I -> R = d_weibull(scale = 2, shape = 4)
  I -> D = d_exponential(0.05)
}) 

denim package

init_vals <- c(S = 999, I = 1, R = 0, D = 0)
parameters <- c(beta = 1.2, N = 1000)

out <- sim(
  sird_cr, 
  init_vals, 
  parameters,
  simulationDuration = 30,
  timeStep = 0.1
)
plot(out, ylim = c(0, 1000))

denim package

When we have either:

  • \(f_{I \rightarrow R}(\tau)\), \(f_{I \rightarrow D}(\tau)\) and \(w_R\), \(w_D\)

  • or \(f_I(\tau)\) and \(w_R\), \(w_D\)

\(I \rightarrow R\) and \(I \rightarrow D\) can be modeled using multinomial framework in denim

denim package

\(I \rightarrow R\) and \(I \rightarrow D\) can be modeled using multinomial framework in denim

denim package

  • Both scenarios are implemented by having multiple sub-compartment chains — one chain per outcome.

  • Scenario 2 is equivalent to setting \(f_{I \rightarrow R}(\tau) = f_{I \rightarrow D}(\tau) = f_I(\tau)\)

denim package

Example: SIRD model with \(I \rightarrow R\) and \(I \rightarrow D\) under multinomial assumption

sird_mult_1 <- denim_dsl({
  S -> I = beta*(I/N)*S
  0.9*I -> R = d_weibull(scale = 2, shape = 4)
  0.1*I -> D = d_exponential(0.05)
}) 

denim package

Example SIRD model with \(I \rightarrow R\) and \(I \rightarrow D\) under multinomial assumption

sird_mult_2 <- denim_dsl({
  S -> I = beta*(I/N)*S
  0.9*I -> R = d_weibull(scale = 2, shape = 4)
  0.1*I -> D = d_weibull(scale = 2, shape = 4)
}) 

denim package

init_vals <- c(S = 999, I = 1, R = 0, D = 0)
parameters <- c(beta = 0.9, N = 1000)

mult1_out <- sim(
  sird_mult_1, 
  init_vals,
  parameters,
  simulationDuration = 40,
  timeStep = 0.1
)

plot(mult1_out, ylim = c(0, 1000))

init_vals <- c(S = 999, I = 1, R = 0, D = 0)
parameters <- c(beta = 0.9, N = 1000)

mult2_out <- sim(
  sird_mult_2, 
  init_vals,
  parameters,
  simulationDuration = 40,
  timeStep = 0.1
)
plot(mult2_out, ylim = c(0, 1000))

denim package

Example: a more complex model structure

denim package

library(denim)
initialValues <- c(S = 999, I = 1, R = 0, V = 0, IV = 0, D = 0) 
 
modelStructure <- denim_dsl({ 
    S -> I = beta * S * (I + IV) / N 
    S -> V = d_exponential(0.01) 
    0.1 * I -> D = d_lognormal(2, 0.5) 
    0.9 * I -> R = d_gamma(1/3, 2) 
    V -> IV = 0.2 * beta * V * (I + IV) / N 
    IV -> R = nonparametric(iv_r_dist) 
    IV -> D = d_weibull(scale = 2, shape = 1.5) 
}) 

parameters <- list( 
  beta = 0.9,  
  N = 1000,
  iv_r_dist = c(0, 0.15, 0.15,  0.05, 0.2, 0.2, 0.25)
)


simulation <- sim(transitions = modelStructure,  
           initialValues = initialValues,  
           parameters = parameters,  
           simulationDuration = 30,  
           timeStep = 0.1) 

denim package

plot(simulation)

denim package

Draw backs

  • Due to the sub-compartment structure, denim will have longer run time compared to ODE solvers.

  • Since the simulation algorithm is in discrete time, larger time step would lead to less accurate result.

denim package

Future plans:

  • Improve run time

  • Add support for structured compartment (for stratification by age, gender, location, etc.)

  • Add support for stochasticity

  • Extend the range of supported parametric distributions: Pareto, inverse Gaussian, Gompertz, log-logistic

denim package

The denim package is available on CRAN

For more information, visit denim website using the QR code

Or via the link: https://drthinhong.com/denim/

A preprint on our package is also available on bioRxiv

Thank you for listening

References

Keeling, M. J., and B. T. Grenfell. 1998. “Effect of Variability in Infection Period on the Persistence and Spatial Spread of Infectious Diseases.” Mathematical Biosciences 147 (2): 207226. https://doi.org/10.1016/s0025-5564(97)00101-6.
Kenah, Eben, and Joel C. Miller. 2011. “Epidemic percolation networks, epidemic outcomes, and interventions.” Interdisciplinary Perspectives on Infectious Diseases 2011: 543520. https://doi.org/10.1155/2011/543520.
Wearing, Helen J., Pejman Rohani, and Matt J. Keeling. 2005. “Correction: Appropriate Models for the Management of Infectious Diseases.” PLOS Medicine 2 (8): e320. https://doi.org/10.1371/journal.pmed.0020320.