This marks the 16th year of the SFU/UBC Joint Statistics Seminar. The goal of the seminar is to have graduate students from SFU and UBC socialize and present their new research. The event is concluded with a talk by a faculty member from one of the universities.

Things will be a little bit different this year, as the seminar will be held online. We still hope that this seminar will give you a great opportunity to chat with friends from both universities and to learn more about the exciting research that is being done!

Many transmission models have been proposed and adapted to reflect changes in policy for mitigating the spread of COVID-19. Often these models are applied without any formal comparison with previously existing models. Here we use an annealed sequential Monte Carlo (ASMC) algorithm to estimate parameters of these transmission models. We also use Bayesian model selection to provide a framework through which the relative performance of transmission models can be compared in a statistically rigorous manner. The ASMC algorithm provides an unbiased estimate of the marginal likelihood which can be computed at no additional computational cost. This offers a significant computational advantage over MCMC methods which require expensive post hoc computation to estimate the marginal likelihood. We find that ASMC can produce results that are comparable to MCMC in a fraction of the time.

Recent trends in the amortized inference literature have largely focused on conditional generative models that achieve high-quality reconstructions in the case where the conditioned set of observations is large. We argue that in addition to this property, the models should also aim for meaningful, human-like uncertainty in the case where the conditioned set of observations is small. In particular, we show that for the neural process model, a simple change to the pooling operator results in posterior predictive samples that are more diverse and realistic when conditioned on small sets of observations. Our image in-painting experiments with the MNIST and CelebA datasets provide empirical evidence that a neural process model with max pooling and a hierarchical variational distribution produces posterior predictive samples that are more desirable in small conditioning set scenarios. We also investigate how neural processes facilitate Bayesian contraction of the posterior uncertainty when size-related inductive biases are absent in the architecture.

In many statistical models, we need to integrate functions that may be high-dimensional. Such integrals may be impossible to compute exactly, or too expensive to compute numerically. Instead, we can use the Laplace approximation for the integral. This approximation is exact if the function is proportional to the density of a normal distribution; therefore, its effectiveness may depend intimately on the true shape of the function. To assess the quality of the approximation, we use probabilistic numerics: recasting the approximation problem in the framework of probability theory. In this probabilistic approach, uncertainty and variability don’t come from a frequentist notion of randomness, but rather from the fact that the function may only be partially known. We use this framework to develop a diagnostic tool for the Laplace approximation and its underlying shape assumptions, modelling the function and its integral as a Gaussian process and devising a “test” by conditioning on a finite number of function values. We will discuss approaches for designing and optimizing such a tool and demonstrate it with simulations and real data, highlighting in particular the challenges one may face in high dimensions.

Cancers from various tissue types have been known to have a latent structure reflecting differentially expressed gene pathways. This substructure can be shared between cancers from different tissues or differentiate cancers from the same tissue. For example, as a result of variations in the Her2-enriched gene pathway, specific genes are differentially expressed allowing us to subtype breast cancer in a more informative manor. However, it is difficult to cluster cancer patients based on this latent structure because the tissue effect often dominates the latent structure effect. In this presentation, we propose a bayesian nonparametric model that accounts for the tissue effect and clusters based on the latent structure using a Dirichlet Process prior. We use a Dirichlet Process prior to infer the number of clusters in the latent structure from the data. Our approach allows the model to learn the tissue parameters in a supervised learning setting, while simultaneously learning the latent structure, based on the resulting residuals, in an unsupervised setting. Markov chain Monte Carlo sampling techniques such as metropolized gibbs sampling and split-merge moves are used to learn model parameters. We demonstrate our model by showing results on a publicly available dataset from the International Cancer Genome Consortium. Specifically, we analyze whether patients across different cancer types are clustered together.

Psychometric test data are useful for predicting a variety of important life outcomes and personality characteristics. The Cognitive Reflection Test (CRT) is a short, well-validated rationality test, designed to assess subjects’ ability to override intuitively appealing but incorrect responses to a series of math- and logic-based questions. The CRT is predictive of many other cognitive abilities and tendencies, such as verbal intelligence, numeracy, and religiosity. Cognitive psychologists and psychometricians are concerned with whether subjects improve their scores on the test with repeated exposure, as this may threaten the test’s predictive validity.

This project uses the first publicly available longitudinal dataset derived from subjects who took the CRT multiple times over a predefined period. The dataset includes a multitude of predictors, including number of previous exposures to the test (our variable of primary interest). Also included are two response variables measured with each test exposure: CRT score and time taken to complete the CRT. These responses serve as a proxy for underlying latent variables, “rationality” and “reflectiveness”, respectively. We propose methods to describe the relationship between the responses and selected predictors. Specifically, we employ a bivariate longitudinal model to account for the presumed dependence between our two responses. Our model also allows for subpopulations (“clusters”) of individuals whose responses exhibit similar patterns. We estimate the parameters of our one- and two-cluster models via adaptive Gaussian quadrature. We also develop an Expectation-Maximization algorithm for estimating models with greater numbers of clusters.

Disambiguating the effects of treatments and confounding factors in observational studies has presented a continuous challenge for decades. Propensity scores, which are defined by the probability of treatment assignment given the levels of the confounders, have been widely used to tackle this problem. There are a myriad of methods which use propensity scores, but in general these methods include an exposure model and an outcome model. The former models the probability of being assigned to a treatment group as a function of the confounders, while the latter models the outcome of the response as a function of both the treatment and confounders. Although there are numerous Bayesian propensity score methodologies, it turns out that the exposure model only influences the posterior causal effect estimate of the treatment when there is a prior dependence between the exposure and outcome models. This begs the question of whether it is worthwhile to model the exposure and its dependence with the outcome. Our work is concerned with demonstrating the utility of answering this question via the use of Monte Carlo methods to calculate the average mean squared errors of competing model choices.

Modern astronomy involves complex data generating mechanisms, complex data collection mechanisms, and complex underlying physics questions, resulting in an abundance of complex statistical challenges. In particular, astronomers increasingly rely on computer simulators to model complex physics, but statistical methodology for combining these simulators with astrophysical data is often inadequate. In this talk I will describe my current work in astrostatistics, which involves developing statistical methods that incorporate physics-based computer simulators, are suited to the particular scientific and data-analytic challenges at hand, and provide uncertainty quantification. This work tackles interesting scientific challenges in application areas such as exoplanet detection, solar physics, stellar evolution, and Mars planetary science, among others.

We will be providing food gift cards to graduate students (and up to 50 undergraduate students). Feel free to stay and mingle with others!