8) Detecting influenza A virus antigenicity with density-based algorithms

Mentor(s): Dr. Pej Rohani group
Abstract: Influenza viruses, such as H3N3 and H1N1, are a major cause of illness and death worldwide, accounting for over 3 million severe cases and approximately 500,000 deaths each year. These viruses also have a significant economic impact. The use of predictive modelling has been a useful in detecting new strains of the virus and developing the vaccines against the virus.
In this project, we will compare the predictive performance of density-based outlier detection algorithms, specifically the Empirical-Cumulative-distribution-based Outlier Detection (ECOD) algorithm, with isolation forest and one-class support vector machines (SVMs) in detecting unusual physicochemical properties in influenza A viruses. By identifying viruses with significantly different physicochemical properties, we can identify those that may be antigenically distinct, which has important implications for vaccine development and public health efforts.
We will use both simulated and benchmark data from Smith et al. for the experiments and fit the models using Python libraries such as Sciklearn, Pandas, Numpy, and PyoD. We will then use R packages, including ggplot2, ggdensity, dplyr, and tidyverse, for data visualization and analysis.
The ECOD algorithm has three desirable properties for this task: it can be fitted with limited hyperparameter tuning, it is computationally effective, and it improves interpretability. We will compare the predictive and inferential performance of all three algorithms. The results for isolation forest and one-class SVMs will be provided from previous experiments. We will end the project with a final report.
Is the project computational, empirical, or both? Computational.

Modeling Fitness of Immune Evading Pertussis Mutants

Gowri Vadmal, a student at Stanford University, worked in the lab of Dr. Pej Rohani

Abstract Pertussis was considered one of the great diseases of childhood with most people experiencing a bout of the infection by the age of 15.  The initial roll out of vaccines in the 1950s led to a marked decline in pertussis incidence, with optimism over its potential elimination.  Over the past 20-30 years, however, a clear increasing trend in pertussis cases has emerged.  A number of putative Pertussis (whooping cough) is caused mainly by the bacterium Bordetella pertussis. Many countries use acellular pertussis vaccines containing the antigen pertactin (PRN), which plays an important role in pathogenesis. In recent years, we’ve observed an increasing number of B. pertussis isolates that are PRN-deficient and able to infect people even in highly vaccinated countries. We used SIRV models to look at the fitness cost of the bacterium losing PRN and the advantage of being able to infect already vaccinated people. Strains can invade the population only if their leakiness (ability to infect vaccinated people) and transmission are above the threshold for invasion, which depends on vaccination coverage and the cost of immune evasion for the strain. At low vaccination coverage, strains with high leakiness dominate the system when the fitness cost to evade immunity is low, but as cost increases for the strains to infect people, there is bistability between the wildtype and mutant strains. However, at higher vaccination coverage, the wild type completely fades out and the strains with the highest leakiness dominate the system. Thus, the conditions for B. pertussis mutant invasion can change depending on a population’s vaccine coverage, the cost of losing the pertactin gene, and the advantage of being able to evade vaccine immunity.


Modelling the incidence and transmission dynamics of the Hepatitis A virus

Jesus Cantu, a Sociology major from Princeton University, worked with Drs. Tobias Brett and Pejman Rohani to model Hepatitis A infection.

Abstract: Hepatitis A is an acute infectious disease caused by the hepatitis A virus (HAV). In the US, an incremental approach to vaccination was initiated after the vaccine became available in 1995. In effect, a continuous decline has been experienced in the overall HAV incidence from 6.0 cases per 100,0000 individuals in 1999 to 0.4 cases per 100,000 individuals in 2011. Recently, an increasing trend in the proportion of HAV cases who were hospitalized was observed, in the US, from 7.3% in 1999 to 24.5% in 2011. Asymptomatic and non-jaundiced HAV-infected persons, especially children, have previously been identified as an important source of HAV transmission. However, the number of asymptomatic HAV-infections, through time, and their role in sustaining transmission have not clearly identified. To answer these questions, we constructed a mechanistic SIR-model with high and low risk classes implemented as a system of ordinary differential equations which were numerically integrated in R. Particular attention was placed on the effect of the implementation of different vaccination strategies on disease burden and transmission. Preliminary results show that infections from low risk individuals contribute negligibly to the number of symptomatic cases.

Download (PDF, 641KB)

Epidemiological data, parameter estimation and pitfalls

Jonathan Waring, a Computer Science major at the University of Georgia, worked with Ana Bento in the lab of Pejman Rohani to examine how the the choice of data used in a disease model affects the results.

Abstract: During an emerging infectious disease outbreak, epidemiological parameters, such as transmission potential and mean infectious period, are estimated for a timely and effective response. The standard procedure for attaining quick estimates of these quantities is fitting transmission models to incidence data. Cumulative incidence (total number of infections to date) is often used rather than raw incidence (number of new cases in a defined reporting period), but there is evidence to suggest that this choice of data can affect our perceptions of the variability in the parameters and hence the uncertainty in our predictions. To further elaborate on this problem, we fit deterministic and stochastic models with both raw and cumulative simulated epidemic data in order to assess the biases and errors associated with data choice. Fitted simulations to the data using deterministic and stochastic methods result in comparable variances, with cumulative models under predicting the true incidence. However, in stochastic parameter estimation and posterior sampling using particle Markov chain Monte Carlo (pMCMC), cumulative data produces much wider confidence intervals, and thus better quantifies uncertainty than models using raw data. When we consider the entire time-series of an epidemic, cumulative and raw data will both be useful in parameter estimation depending on the level of uncertainty we are willing to accept.


Download (PDF, 803KB)