Themes and Projects
Our research is centered around two focus themes: Public Health Microbiology, and Precision Medicine.
Normal human physiological variation and at least half of all diseases have significant genomic components, consisting of millions of individually rare but collectively significant genetic variations. Moreover, environmental factors interacting with these genomic components can modulate normal and disease phenotypes. Only through the digitalization of biology will scientists obtain the statistical power and insight to discover significant correlations and causal relationships between the genotype, the environment, and the phenotype of healthy individuals and patients. Data Science is providing the tools to store, distribute, and analyse these datasets, in combination with clinical, environmental, physiological and other related datasets. Data Science is thus creating the foundation for personalised medicine and for improved individual and population health, allowing scientists, physicians, and public health agencies to investigate the following issues:
- Genetic causes of hereditary and acquired human illness;
- Personalized treatment of hereditary and acquired human illness;
- Impact of genetics on public health, and disease prevention;
- Role of genetic diversity within human populations;
- Role of genetic diversity on ecosystem health;
- Social and policy implications.
These findings will ameliorate the pathologies of aging, transform the treatment of cancer, radically alter clinical trials due to multi-omic segmentation, and will revolutionize pharmacy and limit drug toxicitiy. While most Data Science initiatives in the life sciences focus on the application of existing methods to solve application problems, there is an increasing need for the development of innovative computational methods, due to the emergence of technologies that generate new types of data at population scale and due to the rapidly increasing size of datasets.
The following are examles of ongoing research projects involving ODSI team members, to illustrate the sorts of benefits the new Institute intends to help orchestrate.
The PathOGiST project, jointly funded by Genome Canada and CIHR, aims at exploiting whole-genome sequencing data from bacterial pathogens to identify important sources of variability and ultimately link together related cases, to assist public health officials in the tracing and control of outbreaks. Its initial focus was on Mycobacterium tuberculosis, Salmonella enteritidis and Giardia intestinalis. The proposed project will constitute an extension of the PathOGiST project to other clinically important bacterial pathogens, including Clostridium difficile, Escherichia coli and Staphylococcus aureus. The bulk of the efforts will focus on identifying, tracking and predicting drug resistance-associated variants in these pathogens, which together account for the vast majority of nosocomial (hospital-acquired) infections in Canada. The established partnerships with the BC Centre for Disease Control (henceforth BCCDC) will be critical in providing access to the relevant data, both sequencing data in the form of FASTQ files as well as epidemiological metadata.
Ending the HIV-1 pandemic will require an effective vaccine to protect those at risk of infection and a strategy to achieve a functional (or sterilizing) cure for those already infected. To date however, no HIV-1 vaccine candidate that has reached advanced clinical testing has been able to elicit protective cellular immune responses, and HIV elimination strategies have similarly met with limited success in the clinic. Improved T-cell immunogen designs capable of overcoming HIV-1's extensive capacity for mutation, genetic diversification and adaptation - at the within-and between-host scales - are thus urgently needed. Vaccine and eradication concepts guided by analysis of large-scale human/viral datasets represent a promising novel approach. Towards this goal, the Brumme and Brockman laboratories in SFU’s Faculty of Health Sciences lead teams of computational, molecular and clinical biologists with expertise in HIV-1 adaptation, viral fitness, and vaccine discovery and who lead unique HIV-1 cohorts globally. The fundamental premise of the approach is that large-scale, integrated analyses of host genetic, HIV-1 genetic and HIV-1 functional data can inform the design of novel HIV-1 T-cell vaccine immunogens for both prophylactic HIV-1 vaccines as well as "personalized" immunotherapeutic approaches for HIV elimination.
iReceptor: A Platform for Storing and Sharing Next Generation Sequencing (NGS) Data from Adaptive Immune Receptor Repertoires
The human immune system has evolved to adapt quickly to fight off infection and disease, producing an immense diversity of antibody/B-cell and T-cell immune receptors to recognize and remove bacterial and viral microorganisms. With next generation sequencing (NGS) technology, it is now standard to sequence millions of these receptors per sample and per time point. This presents a classic Big Data problem; there are currently no means for researchers to share these huge NGS data sets, severely limiting the application of this new knowledge of the human immune system. To optimally use these datasets for biomedical research and patient care, it is necessary to integrate immune receptor datasets of sufficient diversity and statistical power across studies and institutions. The iReceptor Platform, one of seven Challenge 1 Cyberinfrastructure Projects funded by CFI, will establish an integrated system that will allow researchers to access large common repositories of immune receptor data.
Infectious Disease Genomic Epidemiology Informatics: Improved Disease Tracking and Control for Agri-Foods and Public Health
Modern epidemiological investigations of infectious disease outbreaks -important for tracking disease relevant to human and other animal and plant health, are transitioning to routinely incorporate Whole Genome Sequencing (WGS) data for microbial pathogens. WGS provides a wealth of information previously unavailable, enabling the highest resolution “fingerprinting” and analysis of isolates, necessary for detailed tracking of disease spread. However, the application of WGS for genomic epidemiology continues to be hindered by the complexities of data management, integration and analysis, with significant socio-political and technological barriers to overcome. The IRIDA (Integrated Rapid Infectious Disease Analysis) project is developing a platform for genomic epidemiology that stores and manages WGS data and associated epidemiological and lab data, providing the execution of analysis pipelines, as well as visualization and evaluation of results.
Cancer is one of the leading causes of death and a disease in which genetic and Darwinian factors play an especially large role. Remarkable advances in genomics, computer science, and therapeutics now make it possible to consider precision oncology in the clinic. To enable precision oncology, this research project develops diagnostic, prognostic, and predictive biomarkers for patient stratification and improved patient management. A major goal of the next phase of this project is to better understand mechanisms of metastasis and development of therapeutic resistance for identification of new drug targets. Deep neural networks will be explored as a method to predict the direction of evolution, given a set of initial conditions (e.g. mutations) and selection pressures (e.g. therapy).
Big data is creating new opportunities for knowledge production and data driven decision making in science and biomedicine. Often domain experts do not have the programming or database skills to create the necessary data infrastructures. Translating large datasets into useful information for decision-making is also difficult. These realities increasingly create a need for collaborative networks of stakeholders who traditionally have not worked together. Domain experts and practitioners often need the expertise of data scientists. Whether it is innovators’ discussions about development, translators facilitating a technology from innovators to users, users making decisions, or stakeholders voicing perspectives on benefits and risks, communication shapes big data technologies. In this study, we explore the relationship between data innovation, technology development, and social change in the context of clinical genomics. This research will allow us to understand the role of communicative mechanisms between stakeholders developing big data technologies. Useful information, not just a lot of data, is critical for impacting practitioners’ decision-making and activities in a positive way to reach their goals. Likewise, social issues must be studied in order to identify, anticipate, and avoid unintended consequences. In this study we aim to: observe knowledge making processes and interaction between different actors and stakeholders; describe the variety of aims and goals held by a range of stakeholders innovating big data; analyze the impact of big data development and adoption on social relations and decision-making; and develop evidence based information for big data innovators that facilitates domain-expert interaction, improves ethical aspects of big data and enhances the productivity and impact of big data technologies.
Cancer Drug Discovery and Development
Computational prediction of the interaction between potential drugs and targets, i.e. proteins which are hypothesized to be disease-related, is a standing challenge in the field of drug discovery. Such in silico approaches are capable of speeding up the experimental wet lab work by systematically prioritizing the most potent compounds and help predicting their potential side effects. The Ester lab and the Cherkasov lab have collaborated for several years on the computational prediction of drug-target interactions. More recently, the Ester lab has worked with the Carleton lab to explore methods for analyzing adverse drug reactions, more specifically for predicting genetic causes of adverse drug reactions. Most existing drug-target interaction prediction methods, including the one developed in our previous work, cannot deal with cold-start targets and drugs, for which no interaction information is available. This project will explore an approach based on Deep Neural Networks to address this serious limitation. The project will also extend our existing method for discovery of individual causal genes, based on the paradigm of quasi-experimental design, to discovers combinations of causal genes, and will investigate an alternative approach based on Bayesian network structure learning.