Dr. Trevor Campbell

Title: Bayesian Inference for Big Data

Date: Friday, October 27th, 2023
Time: 1:30PM (PDT)
Location: ASB 10900

Abstract: Since shortly after the popularization of stochastic gradient optimization methods in machine learning---which now scale model training to billions of examples and beyond---researchers have been trying to use the same basic data subsampling techniques to speed up computational Bayesian inference algorithms. In this talk, I'll cover the broad classes of methods that have been developed, highlights of progress in the field, and the current state of the art. Along the way I'll introduce some recent work from my group on scalable Bayesian inference via coresets, i.e., sparse dataset summaries. I'll show that coresets offer an exponential compression of the data (and so an exponential speed-up of methods like Markov chain Monte Carlo) in a wide variety of models, and can be constructed in an automated manner, with theoretical convergence guarantees, and without requiring special knowledge of model structure beyond conditional independence of data. While other methods implicitly rely on asymptotic Gaussianity, coresets are particularly amenable to posteriors that don't exhibit this usual asymptotic behaviour, with discrete variables, weak or unidentifiability, low-dimensional manifold structure, etc. I'll conclude with empirical results and a discussion of next steps for Bayesian inference in the big data regime.