Picture of Me

Manoj Bheemasandra Rajashekar
MASc Student, Heterogeneous Computing Researcher

School of Engineering Science
Simon Fraser University
8888 University Dr.
Burnaby, BC V5A 1S6

Email: mba151 [at] sfu.ca


I'm a passionate Computer Engineering graduate student pursuing my MASc in Engineering Science at Simon Fraser University, Burnaby. Under the guidance of Prof. Zhenman Fang, I'm delving into the fascinating realm of accelerating HPC applications on data-center FPGAs, with a specific focus on SpMV—a crucial kernel utilized in Machine Learning, Graph Analytics, Circuit Simulation, and more. Before my current endeavors, I earned my Bachelor's in Electronics and Communication Engineering from Ramaiah Institute of Technology, Bengaluru, India. During my undergraduate years, I honed my skills in embedded systems through my final year project, Teleconsulting Device under the mentorship of Dr. Suma K V.

My journey has been enriched by diverse experiences, including a six-month internship at Texas Instruments India as an Analog Layout Design Intern, where I gained valuable insights into hardware design and implementation. Additionally, I am part of Team Phantom, contributing to the firmware development for F1 Electric Vehicle. Prior to that, I interned at IIT Bombay under the eYSIP program, where I worked on developing eYan (a miniature balancing bot controlled by a web app). Additionally, my internship at SAMSUNG PRISM involved developing a Neural Style Transfer model and deploying it on an Android device.


What's New

March 2024
My first authored paper on accelerating imbalanced SpMV on FPGA is published on FPGA 2024 conference
September 2023
I received Graduate Fellowship Award from Simon Fraser University
September 2022
I received Globalink Graduate Fellowship Award from Mitacs

Publications

Conference Papers

C2

HiSpMV: Hybrid Row Distribution and Vector Buffering for Imbalanced SpMV Acceleration on FPGAs FPGA '24

Manoj B. Rajashekar, Xingyu Tian, Zhenman Fang
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '24)
Acceptance Rate: 22/111 = 19.8%

Sparse matrix-vector multiplication (SpMV) is a fundamental opera- tion in numerous applications such as scientific computing, machine learning, and graph analytics. While recent studies have made great progress in accelerating SpMV on HBM-equipped FPGAs, there are still multiple remaining challenges to accelerate imbalanced SpMV where the distribution of non-zeros in the sparse matrix is imbal- anced across different rows. These include (1) imbalanced workload distribution among the parallel processing elements (PEs), (2) long- distance dependency for floating-point accumulation on the output vector, and (3) a new bottleneck due to the often-overlooked input vector after the SpMV acceleration.

To address those challenges, we propose HiSpMV to accelerate imbalanced SpMV on HBM-equipped FPGAs with the following novel solutions: (1) a hybrid row distribution network to enable both inter-row and intra-row distribution for better balance, (2) a fully pipelined floating-point accumulation on the output vector using a combination of an adder chain and register-based circular buffer, (3) hybrid buffering to improve memory access for input vec- tor, and (4) an automation framework to automatically generate the optimized HiSpMV accelerator. Experimental results demonstrate that HiSpMV achieves a geomean speedup of 15.31x (up to 61.66x) for highly imbalanced matrices, compared to state-of-the-art SpMV accelerator Serpens on the AMD-Xilinx Alveo U280 HBM-based FPGA. Compared to Intel MKL running on a 24-core Xeon Silver 4214 CPU, HiSpMV achieves a geomean speedup of 8.30x. Com- pared to cuSparse running on an Nvidia GTX 1080ti GPU, HiSpMV achieves a geomean of 1.93x better performance per watt.

C1

A New Constant Coefficient Multiplier for Deep Neural Network Accelerators VLSI SATA '22

Manoj B R, Jayashree S Yaji, Raghuram S
2022 IEEE 3rd International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI SATA)

Deep Neural Network accelerators are a key component in enabling the penetration of Artificial Intelligence, through their integration at the edge. However, due to the requirement of hierarchical learning in deep networks, the number of layers and parameters is very large. In this work, an optimization for the multiplier used as part of the MAC unit is proposed: the weights can be considered constant once downloaded to the accelerator, as these do not change during the inference process. Under these circumstances, pre-processing steps which make the computation of the product faster are introduced in this work. Since this step is performed only once, the overhead due to the pre-processing unit is minimal, while optimizing the multiplier in thousands of MAC units across the architecture. Experimental results show that the variable part of the SCM design that has been proposed in this work serves well when a tradeoff between power, area and timing is taken into consideration. The results proposed are a comparison of the variable parts of both the KCM and SCM designs, along with the default multiplier, as the primary idea behind this work is the one time use of the constant multiples generator.

@INPROCEEDINGS{10046622, author={Manoj, B R and Yaji, Jayashree S and Raghuram, S}, booktitle={2022 IEEE 3rd International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI SATA)}, title={A New Constant Coefficient Multiplier for Deep Neural Network Accelerators}, year={2022}, volume={}, number={}, pages={1-5}, keywords={Deep learning;Neural networks;Systems architecture;Computer architecture;Very large scale integration;Generators;Timing;Constant Coefficient Multipliers;Arithmetic Circuits;DNN Accelerators}, doi={10.1109/VLSISATA54927.2022.10046622}}

Awards

September 2023
Graduate Fellowship Award from Simon Fraser University
September 2022
Globalink Graduate Fellowship Award from Mitacs

Teaching

ENSC 254
Introduction to Computer Organization, Summer 2023, TA