Jonathan Waring, a Computer Science major at the University of Georgia, worked with Ana Bento in the lab of Pejman Rohani to examine how the the choice of data used in a disease model affects the results.
Abstract: During an emerging infectious disease outbreak, epidemiological parameters, such as transmission potential and mean infectious period, are estimated for a timely and effective response. The standard procedure for attaining quick estimates of these quantities is fitting transmission models to incidence data. Cumulative incidence (total number of infections to date) is often used rather than raw incidence (number of new cases in a defined reporting period), but there is evidence to suggest that this choice of data can affect our perceptions of the variability in the parameters and hence the uncertainty in our predictions. To further elaborate on this problem, we fit deterministic and stochastic models with both raw and cumulative simulated epidemic data in order to assess the biases and errors associated with data choice. Fitted simulations to the data using deterministic and stochastic methods result in comparable variances, with cumulative models under predicting the true incidence. However, in stochastic parameter estimation and posterior sampling using particle Markov chain Monte Carlo (pMCMC), cumulative data produces much wider confidence intervals, and thus better quantifies uncertainty than models using raw data. When we consider the entire time-series of an epidemic, cumulative and raw data will both be useful in parameter estimation depending on the level of uncertainty we are willing to accept.