Zhong Jia (William) Xue
MASc Student, Heterogeneous Computing Systems Researcher

Applied Science Building
8888 University Dr.
Simon Fraser University
Burnaby, BC V5A 1S6

Email: william_xue [at] sfu.ca

I am a MASc student in Computer Engineering at Simon Fraser University, advised by Prof. Zhenman Fang. My research interests include accelerating point cloud applications on FPGAs. I have also participated in research collaborations in deep learning acceleration on FPGA.

During my undergraduate degree at Simon Fraser University, I participated in various co-op terms including Firmware development at Avigilon, Firmware verification at Microchip and ASIC verification at Intel. In particular, I am interested in studying embedded systems, robotics and deep learning.

Outside of academics, I enjoy weightlifting, hiking, badminton, skiing and playing piano. I also own my own 3D printer and tinker with designs. You can check out my designs for my undergraduate capstone project here.

Publications

SDA: Low-Bit Stable Diffusion Acceleration on Edge FPGAs FPL '24

Geng Yang, Yanyue Xie, Zhong Jia Xue, Sung-En Chang, Yanyu Li, Peiyan Dong, Jie Lei, Weiying Xie, Yanzhi Wang, Xue Lin, Zhenman Fang
The 34th International Conference on Field-Programmable Logic and Applications (FPL '24)

This paper introduces SDA, the first effort to adapt the expensive stable diffusion (SD) model for edge FPGA deployment. First, we apply quantization-aware training to quantize its weights to 4-bit and activations to 8-bit (W4A8) with a negligible accuracy loss. Based on that, we propose a high-performance hybrid systolic array (hybridSA) architecture that natively executes convolution and attention operators across varying quantization bit-widths (e.g., W4A8 and all 8-bit QK^T V in attention). To improve computational efficiency, hybridSA integrates diverse DSP packing techniques into hybrid weight-stationary and output-stationary dataflows that are optimized for convolution and attention. It also supports flexible dataflow transitions to address the distinct demands of its output sequence by subsequent nonlinear operators. Moreover, we observe that nonlinear operators become the new performance bottleneck after the acceleration of convolution and attention, and offload them onto the FPGA as well. Experimental results demonstrate that our low-bit (W4A8) SDA accelerator on the embedded AMD-Xilinx ZCU102 FPGA achieves a speedup of 97.3× compared to the original SD-v1.5 model on the ARM Cortex-A53 CPU.

@inproceedings{yang2024sda, title={Sda: Low-bit stable diffusion acceleration on edge fpgas}, author={Yang, Geng and Xie, Yanyue and Xue, Zhong Jia and Chang, Sung-En and Li, Yanyu and Dong, Peiyan and Lei, Jie and Xie, Weiying and Wang, Yanzhi and Lin, Xue and others}, booktitle={2024 34th International Conference on Field-Programmable Logic and Applications (FPL)}, pages={264--273}, year={2024}, organization={IEEE} }

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers ICS '24

Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang
Proceedings of the 38th ACM International Conference on Supercomputing (ICS '24)

Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for hardware implementation while preserving the accuracy. First, Quasar-ViT trains a supernet using our row-wise flexible mixed-precision quantization scheme, mixed-precision weight entanglement, and supernet layer scaling techniques. Then, it applies an efficient hardware-oriented search algorithm, integrated with hardware latency and resource modeling, to determine a series of optimal subnets from supernet under different inference latency targets. Finally, we propose a series of model-adaptive designs on the FPGA platform to support the architecture search and mitigate the gap between the theoretical computation reduction and the practical inference speedup. Our searched models achieve 101.5, 159.6, and 251.6 frames-per-second (FPS) inference speed on the AMD/Xilinx ZCU102 FPGA with 80.4%, 78.6%, and 74.9% top-1 accuracy, respectively, for the ImageNet dataset, consistently outperforming prior works.

@inproceedings{li2024quasar, title={Quasar-vit: Hardware-oriented quantization-aware architecture search for vision transformers}, author={Li, Zhengang and Lu, Alec and Xie, Yanyue and Kong, Zhenglun and Sun, Mengshu and Tang, Hao and Xue, Zhong Jia and Dong, Peiyan and Ding, Caiwen and Wang, Yanzhi and others}, booktitle={Proceedings of the 38th ACM International Conference on Supercomputing}, pages={324--337}, year={2024} }

Awards

January 2026: Helmut & Hugo Eppich Family Grad Scholarship
September 2025: Graduate Fellowship
September 2024: Graduate Fellowship
September 2023: Graduate Fellowship
March 2023: BC Graduate Scholarship
September 2022: Undergraduate Student Research Award, NSERC
2018 - 2022: SFU Alumni Scholarship

Teaching

ENSC 254: Introduction to Computer Organization, Summer 2023, Summer 2024, TA

Contact

Please contact me via email: william_xue [at] sfu.ca

Misc

You are the No. th visitor of my homepage.

Zhong Jia (William) Xue MASc Student, Heterogeneous Computing Systems Researcher