SFU/UBC Joint Graduate Student Seminar

SFU/UBC Joint Graduate Student Seminar

Saturday, September 29th 2012

Saturday, September 29th

9:00-9:30 Coffee and pastries at Blenz (508 West Hastings Street) across the street from the seminar location

9:30-9:45 Head to room 7000 (Earl & Jennie Lohn Policy Room) at SFU Harbour Centre (515 West Hastings Street)

9:45-10:10 Joslin Goh, SFU

Title: Prediction and Calibration Using Outputs From Multiple Simulators

10:10-10:35 Seong-Hwan Jun, UBC

Title: Entangled Monte Carlo

10:35-11:00 Zheng Sun, SFU

Title: EDF Tests for Ordered Categorical Data

11:00-12:00 Dr. Ruben Zamar, UBC

Title: Robustness and Other Things

12:00-14:00 Lunch at Rogue Kitchen and Wetbar (601 West Cordova Street)

14:00-15:00 Dr. Joan Hu, SFU

Title: Statistical Analysis for Forest Fire Control

15:00-15:25 Jabed Hossain Tomal, UBC

Title: Ensembling Descriptor Sets using Phalanxes of Variables to Rank Activity of Compounds in

QSAR Studies

15:25-15:50 Shirin Golchi, SFU

Title: Monotone Interpolation: Sampling from a Constrained Gaussian Posterior

15:50-16:15 Vincenzo Coia, UBC

Title: A New Sieve Model for Extreme Values

The SFU-UBC Joint Graduate Student Workshop in Statistics is going into its 7th year. Every year two such seminars take place, the one in Fall is organized by the students from SFU and the one in Spring is organized by the students from UBC. The idea of this event is to provide graduate students in Statistics and Actuarial Science with an opportunity to attend a seminar with accessible talks and to give them a chance to present their work, as well as to offer them an opportunity to practice their presentation skills, all of these in a relaxed environment. It also provides us with the opportunity to interact with graduate students from both participating Departments.

Continuing with the usual format of past years this event will consist of talks given by 6 students (3 from UBC and 3 from SFU) and 2 professors (one from each university). The seminar also contains important social components, namely the morning coffee and the lunch where students get an opportunity to interact with each other and establish fruitful networks.

The seminar conveniently takes place downtown near Waterfront station so that it is easily accessible by transit for everyone. Waterfront station is deserved by all the Skytrain lines : Canada Line, Expo Line and Millenium line. From SFU, the 135 bus will also take you to the seminar location.

This seminar could not take place without the generous help of our sponsors : The Pacific Institute for the Mathematical Sciences (PIMS) and the Graduate Student Society at Simon Fraser University (GSS).

About the Seminar

Schedule and Locations

Abstracts

Joslin Goh

Title: Prediction and Calibration Using Outputs From Multiple Simulators

Abstract: Deterministic simulators are widely used to describe physical processes in lieu of physical observations. In some cases, more than one computer simulator can be used to explore the physical system. Through the combination of field observations and simulated outputs, predictive models are developed for the real physical system. The resulting model can be used to perform sensitivity analysis for the system, solve inverse problems and make predictions. The proposed approach is Bayesian and will be illustrated through applications in predictive science at the Centre for Radiative Shock Hydrodynamics at the University of Michigan.

Seong-Hwan Jun

Title: Entangled Monte Carlo

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles.

Zheng Sun

Title: EDF Tests for Ordered Categorical Data

Abstract: In this talk, we consider a general class of EDF (Empirical Distribution Function) tests for ordered categorical data (ordered contingency tables), that is when the cells have a natural ordering, for example, letter grades on exams. Asymptotic distributions are found under the null hypothesis

H_0: each row follows the same distribution.

Asymptotic distributions under some contiguous alternatives are also found and asymptotic power of these tests can be calculated. A theorem is proved connecting the cases when parameters are known with those when parameters must be estimated.

Components of these test statistics are examined and the first 4 components can be interpreted as tests that are aimed at specific alternatives: location, scale, skewness and kurtosis.

We compare powers of the EDF tests with many competing tests including tests derived from the Neyman Pearson Lemma. EDF tests compare favourably.

A example data set is analyzed.

Dr. Ruben Zamar

Title: Robustness and Other Things

Abstract: Data quality is typically affected by the presence of outliers and other forms of data contamination. It may also be affected by missing data, data duplication, etc. From a broad perspective I am interested in the study of the detrimental effect of poor data quality on statistical inference, and in developing appropriate alternative methods to address these problems. The purpose of this talk is to give students a broad picture of my research interests and some current research projects. "Other things" in the title refers to other related topics I am interested in, such as cluster analysis, model selection, bootstrap and data mining.

Dr. Joan Hu

Title: Statistical Analysis for Forest Fire Control

Abstract: This talk discusses statistical issues arising from forest fire control. We start with brief background information to motivate the statistical problems. Models and inference procedures are then proposed. A set of Canadian forest fire data is used throughout the talk for illustration.

This is an on-going project jointly with W. John Braun.

Jabed Hossain Tomal

Title: Ensembling Descriptor Sets using Phalanxes of Variables to Rank Activity of Compounds in QSAR Studies

Abstract: In QSAR studies, molecular descriptors are used to model biological activity of compounds. The statistical model aims to rank rare actives early in a list of compounds. The classifier “random forest” has been found highly accurate in QSAR studies. To enhance its performance in terms of predictive ranking, we propose an ensemble method by grouping variables together. The variables in a group (we call phalanx) are good to put together, whereas the variables in different groups (phalanxes) are good to ensemble. Finally, our method aggregates the phalanxes. There exist several molecular descriptor sets in QSAR studies, and a particular set might do well in ranking activity of compounds for some assays, and fail to do well for other assays. We have considered four assays and five descriptor sets for each. We apply the ensemble of phalanxes to each descriptor set and further ensemble across the five descriptor sets we generated. The performance of our ensemble is compared with random forest. Specifically, random forest was applied to each of the five descriptor sets and to the pool of descriptor sets. We found our method superior to any of the random forests using two rigorous evaluation procedures.

Shirin Golchi

Title: Monotone Interpolation: Sampling from a Constrained Gaussian Posterior

Abstract: Gaussian process (GP) models are popular tools for non-parametric modelling and function estimation. They are commonly used in the area of computer experiments where a finite number of function evaluations are available from a simulator and the underlying function is to be estimated using a statistical model while interpolating the given points. However, in the case that extra information such as monotonicity of the underlying function is available, it is not straight- forward to incorporate the constraints in a GP model. I will talk about the constrained posterior distribution together with a recipe to sample from it.

Vincenzo Coia

Title: A New Sieve Model for Extreme Values

Abstract: Although rare, extreme events leave a lasting impact on our lives and the world in general. It is therefore important to determine the potential magnitude and frequency of such events, especially when these extremes are dangerous. We focus on the case when these extreme values are heavy tailed. Extreme Value Theory provides a theoretical

basis for extrapolating and making inference into these heavy tails; however, there is room for improvement in the extrapolation methods. One modification to the heavy tail is to add an upper truncation; we propose a modification which "progressively truncates" the tail with permeable filters like a sieve. The techniques are then applied to the largest Atlantic hurricanes and the largest black sea bass in Buzzard's Bay. We find that, in most cases, the sieve model provides the best fit, followed by the truncated model.