Project With Jinko Graham

Seeding the simRVsequences R package

Identifying DNA variants that cause increased susceptibility to disease is of great scientific and clinical interest.  One strategy for identifying causal rare DNA variants is to study families with more than their fair share of the disease. However, accumulating a large enough sample of well-characterized families for a family-based study is time consuming and expensive.

To assist with study planning and inference, in collaboration with the BC Cancer Agency's Lymphoid Cancer Families Study, we are developing simRVsequences, an R package to simulate DNA sequence data in families enriched for disease-affected relatives.  Families can be simulated dynamically, allowing for birth, disease onset, and death at the individual level. At the family level, users can model complex sampling criteria for a family to enter the study. The package is unique in allowing users to simulate the underlying genetic cause and exome sequences in disease-enriched families.

This project will involve gathering, processing, and assembling seed data for simRVsequences using publicly available resources for human genetic variation such as the 1000 Genomes Project. The student will be part of a team working on package development.  Experience with R and with working with large data is an asset.