Project With Lloyd Elliott

Improving the statistical genetics of polygenic risk score calculation

Polygenic risk scores (PRSs) are statistical models for disease risk built from the summary statistics for univariate association tests between diseases and genetic markers. PRS is an increasingly used tool to understand genetic basis of disease, patient-specific risk, and the interaction between diseases. Despite their growing popularity, top PRS tools have some limitations in the extent of their statistical analyses (for example, lacking L1 regularized regression for effect sizes, or lacking support for discrete covariates), or the extent to which they may be applied to existing datasets without relying on long pipelines (for example, by supporting advanced genetic file formats such as bgen v1.3, multi-allelic sites, or linkage disequilibrium panels). In this project, the student will become familiar with the top PRS software PRSice and extend the codebase to support both theoretical and technical improvements. This project may involve performing PRS on simulated datasets, or datasets for COVID-19 host genetics.