Research Projects:

Causal Pattern Discovery

Most machine learning methods discover correlations between the independent variables and a dependant variable and exploit them to predict the dependant variable. However, in scientific applications, in particular in the biomedical domain, one often wants to discover causal relationships in order to be able to understand and manipulate a system. We investigate a range of approaches for causal pattern discovery. The idea of quasi-experimental design is to find subsets of a database of observations that correspond to a randomized case/control experiment, e.g. pairs of entities where one has been treated, the other one has not been treated, and both agree in the values of potential confounders. A statistical test is applied to determine whether there is a significant difference in the outcome of the two elements of the pairs. An open research question is how to define the set of confounders and how to match pairs of entities. A Bayesian Network, a directed probabilistic graphical model, is well-suited to express causal relationships between variables that are connected through edges of the network. The existing methods for learning the structure of a Bayesian Network do not scale to real-life networks with thousands of variables, and we investigate how to exploit available domain knowledge to restrict the search space and scale the methods.

Transfer Learning and Multi-task Learning in Deep Neural Networks

Deep Neural Networks have succeeded in many challenging machine learning applications, but they require large, labeled training datasets. Biomedical applications are characterized by small sets of high-dimensional records, e.g. patients with their gene expression profiles. We investigate two main approaches to tailor Deep Neural Networks to such applications. Transfer learning aims at transferring knowledge, i.e. parameters of Deep Neural Networks, from one domain to another domain. In particular, we explore transferring from a large unlabelled dataset to a small labeled dataset and from one entity type such as cell lines to another type such as patients. In the context of Deep Neural Networks, the goal of multi-task learning is to train a network that uses a given set of independent variables to predict multiple dependent variables. A major research question is which layers of the network can be shared by the different outputs, and in particular how to learn the optimal architecture of the Deep Neural Network from the data.

 

In-silico methods for precision oncology

The vision of precision medicine is to diagnose and treat patients more precisely and effectively, taking into account their personal genomic profile, lifestyle, environment and other factors. Since cancer is a disease of the genome, precision oncology focuses on the genomic profile, i.e. gene expression, somatic mutations, copy number aberrations, etc. of a patient. In collaboration with the Vancouver Prostate Center and with the BC Children’s Hospital, we develop a suite of next generation computational methods and tools that support the scientific process of precision oncology in an integrated and comprehensive way, from the discovery of driver events and biomarkers over patient stratification to the discovery of drug targets and the matching to corresponding drugs. The methodology used is combinatorial algorithms for the pre-processing of raw sequence data and machine learning algorithms, in particular Deep Neural Networks, to learn predictive models from the pre-processed data.

 

Machine learning for pharmacogenomics

Given the crucial role that chemo-therapies and drugs play in cancer therapy, a major task is the personalized prediction of the response of an individual patient to a specific drug, given the patient’s genomic profile. The ultimate goal is to recommend a list of drugs including standard of care, ranked in decreasing order of their predicted effectiveness, to a patient with given genomic profile. Machine learning methods have the potential of assisting with the prediction of drug response, both the intended effects and adverse side-effects. We are developing machine learning methods for these tasks in collaboration with the Vancouver Prostate Center and with the BC Children’s Hospital, using both Probabilistic Graphical Models and Deep Neural Networks. The availability of drug response data for patients in clinical datasets tends to be very limited, but the collection of drug response information from pre-clinical disease models, in particular cancer cell lines and patient-derived tumor xenografts (PDX), is much more more feasible. Therefore, we investigate how to transfer drug response models from pre-clinical data to patient data. Another research question is how to predict how a certain mutation in a gene is going to affect a drug that targets the protein coded by that gene.