8) Detecting influenza A virus antigenicity with density-based algorithms

Mentor(s): Dr. Pej Rohani group
Abstract: Influenza viruses, such as H3N3 and H1N1, are a major cause of illness and death worldwide, accounting for over 3 million severe cases and approximately 500,000 deaths each year. These viruses also have a significant economic impact. The use of predictive modelling has been a useful in detecting new strains of the virus and developing the vaccines against the virus.
In this project, we will compare the predictive performance of density-based outlier detection algorithms, specifically the Empirical-Cumulative-distribution-based Outlier Detection (ECOD) algorithm, with isolation forest and one-class support vector machines (SVMs) in detecting unusual physicochemical properties in influenza A viruses. By identifying viruses with significantly different physicochemical properties, we can identify those that may be antigenically distinct, which has important implications for vaccine development and public health efforts.
We will use both simulated and benchmark data from Smith et al. for the experiments and fit the models using Python libraries such as Sciklearn, Pandas, Numpy, and PyoD. We will then use R packages, including ggplot2, ggdensity, dplyr, and tidyverse, for data visualization and analysis.
The ECOD algorithm has three desirable properties for this task: it can be fitted with limited hyperparameter tuning, it is computationally effective, and it improves interpretability. We will compare the predictive and inferential performance of all three algorithms. The results for isolation forest and one-class SVMs will be provided from previous experiments. We will end the project with a final report.
Is the project computational, empirical, or both? Computational.