Case Study

The code underlying AREAdata was first developed for use in our epidemiological analysis of COVID-19 seasonality: “Temperature and Population Density Influence SARS-CoV-2 Transmission in the Absence of Nonpharmaceutical Interventions”, 2021, PNAS, doi: 10.1073/pnas.2019284118.

Here we show a use-case example, combining the data from AREAdata with epidemiological tools, to perform a simple analysis of COVID-19 seasonality in R.

Load R packages

This is going to require a few R packages, including epidemia for streamlined epidemiological modelling and COVID19 for quickly downloading a compilation of COVID data.

library(dplyr)
library(tidyr)
library(COVID19)
library(epidemia)
library(rstanarm)
library(matrixStats)

Get some covid data

We’re going to download COVID infection data for different states of the USA, using the COVID19 package:

Guidotti, E., Ardia, D., (2020), “COVID-19 Data Hub”, Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.

USAcovid <- covid19("US", level = 2)

Merge with temperature from AREAdata

Now we’ll add temperature data from AREAdata, this is the daily temperature data at GID1 level - see the downloads page. We also need the metadata matching GID units with place names, also on the downloads page.

temperature_data <- as.data.frame(readRDS("temp-dailymean-GID1-cleaned.RDS"))
# re-arrange the temperature data from wide to long format
temperature_data$GID1 <- row.names(temperature_data)
temperature_long <- pivot_longer(temperature_data,
                             cols = c(1:(ncol(temperature_data)-1)),
                             names_to = "date",
                             values_to = "temperature")

# merge this with the metadata to get place names
metadata <- read.csv("name-matching.csv")

# cut this down to just state level (remove the county-level metadata)
state_meta <- distinct(metadata[,c("GID_0", "NAME_0", "GID_1", "NAME_1")])

# merge with the temperature data so we have state names to match those of the COVID19 package
state_temps <- merge(temperature_long, state_meta, by.x = "GID1", by.y = "GID_1", all.x = TRUE)

# then cut it down to just the USA
USA_temp <- state_temps[state_temps$GID_0 == "USA",]
USA_temp <- USA_temp[!is.na(USA_temp$GID1),]

# remove unecessary objects
rm(list = c("state_meta", "state_temps", "temperature_data", "temperature_long"))

# now merge this into the covid data
USA_covid_temp <- merge(USAcovid, USA_temp, by.x = c("administrative_area_level_2", "date"), by.y = c("NAME_1", "date"), all.x = TRUE)

Epidemiological analysis

Now we have COVID data merged with daily temperatures, we can do an epidemiological analysis to see if temperature affects SARS-CoV-2 transmission. We’re going to use epidemia to fit a Bayesian heirarchical model to the observed COVID deaths, using a weekly random walk to estimate Rt (the time-varying reproduction rate) through time. Using the data we’ve already combined, we’re going to add temperature and an estimate of non pharmaceutical intervention stringency as parameters to our model, first scaling those variables so we can see which is more important.

This is similar to what was performed in Smith et al. 2021, but here for simplicity we’ll just run it for a single US state - New York.

data <- USA_covid_temp[USA_covid_temp$administrative_area_level_2 == "New York",]

# deaths are cumulative, kets get them to daily
data$cumulative_deaths <- data$deaths
data$deaths <- diff(c(0, data$cumulative_deaths))

# scale the parameters before running the model
data$s.temp <- scale(data$temperature)
data$s.npis <- scale(data$stringency_index)

# add a weekly variable for the random walk
tmp <- NULL
gid1s <- unique(data$administrative_area_level_2)
for (NAME in gid1s){
  d <- data[which(data$administrative_area_level_2 == NAME),]
  d$week = c(rep(1:(length(d$date)%/%7), each = 7), 
             rep(length(d$date)%/%7, length(d$date)%%7))
  tmp <- rbind(tmp, d)
}
data = tmp[tmp$date < as.Date("2020-06-01"),]

# Sets observation
deaths <- epiobs(formula = deaths ~ 1,
                 prior_intercept = rstanarm::normal(0,0.2), # this encodes the ifr.  assumed to be between 0 and 2, with mean of 1
                 link = scaled_logit(0.02),
                 i2o = EuropeCovid$inf2death)
#usa model use a negative binomial for deaths

# Sets infections
inf <- epiinf(gen = EuropeCovid$si,
              seed_days = 6,
              prior_tau = rstanarm::exponential(0.02), #usa model is 0.03
              pop_adjust = TRUE,
              susceptibles = population)

options(mc.cores = parallel::detectCores())

# set rt formulation (random walk plus temperature and NPIs)
rt <- epirt(formula = R(GID1, date) ~ 1 + s.npis + s.temp +
              rw(time=week, gr = GID1, prior_scale=0.1), # Region specific rw
            prior = rstanarm::normal(scale = 0.5), # same as USA paper
            prior_covariance = rstanarm::decov(scale = 0.5), # same as USA paper
            link = scaled_logit(7))

# set arguments for epidemia
args <- list(rt=rt, inf=inf, obs=deaths, data=data,
             algorithm = "sampling",
             iter = 3000, control = list(max_treedepth = 15))

# Solve the model
fm <- do.call(epim, args)

Evaluate the model outputs

We can use the plotting functions in epidemia to plot Rt through time and also plot the estimated deaths from the model against the observed deaths.

# plot Rt
plot_rt(fm)

# plot the observed against predicted deaths
plot_obs(fm, type = "deaths", plotly=FALSE)

How do the parameters in our model affect Rt? We can extract the coefficients and plot them:

# extract coefficients
beta <- as.matrix(fm, par_models = "R", par_types = "fixed")
colnames(beta) <- c("Intercept", "NPIs", "Temperature", "RW")

print(c("Temperature:", quantile(beta[,"Temperature"], probs = c(0.05, 0.25, 0.5, 0.75, 0.95))))

##                                        5%                  25% 
##       "Temperature:"  "-1.14934601679666" "-0.742972883832621" 
##                  50%                  75%                  95% 
## "-0.490469591513119" "-0.237287384079784"  "0.104233989924541"

print(c("NPI Stringency:", quantile(beta[,"NPIs"], probs = c(0.05, 0.25, 0.5, 0.75, 0.95))))

##                                      5%                 25%                 50% 
##   "NPI Stringency:" "-2.48950958447283" "-2.27067988604891" "-2.12026999633237" 
##                 75%                 95% 
## "-1.96326450452889" "-1.67915744014905"

# plot the posterior distributions for the coefficients:
bayesplot::mcmc_intervals(beta)

We find a much stronger effect of NPI stringency than temperature on Rt, though both coefficients are negative (Rt reduces with higher temperature, Rt reduces with more stringent NPIs). This is qualitatively the expected result from Smith et al. 2021.