Akhil Raj Barnawal
M.A.Sc student

ASB 10817
Applied Science Building
8888 University Drive
Simon Fraser University
Burnaby, BC V5A-1S6

Email: akhil_baranwal [at] sfu.ca

I am a MASc student in Computer Engineering at Simon Fraser University, advised by Prof. Zhenman Fang.
Prior to SFU, I have worked at Imagination Technologies as a CPU Design Engineer and at Micron Technology as a SoC Verification Engineer.
I received my B.E. from BITS Pilani, Hyderabad Campus.

My current research interests include characterization of accelerator-rich architectures, systems for ML, and neuromorphic architectures. I think that closing the gap between software programmers and hardware design is as trememdously significant as challenging. I am also interested in the general study of cognition and behaviour of the mind, and always open to discuss bio-inspired hardware architecture design.

It's possible I'm currenty sipping Chai (slurp)

What's New

September 2023: I start my research at Hi-Accel
August 2022: I join Imagination Technologies for development of RISC-V based automotive CPUs
September 2020: I join Micron Technology for development of UFS 4.0 storage controllers
July 2020: I graduated (with Honors) from BITS Pilani University, Hyderabad Campus
Jan 2020: I join the Processor Design team, CFAED at TUD as a guest researcher, advised by Dr. Akash Kumar.
Jan 2019: I join the MMNE lab at BITS Pilani as an undergrad researcher advised by Dr. Sanket Goel.

Research Highlights

ReLAccS: We (CFAED-PD) propose a multilevel approach to scalable accelerator design for reinforcement-learning on FPGAs
P4: We (MMNE) build an inexpensive, completely automated potentiostat suited for resource-constrained and non-precise applications

Publications

(or view my Google Scholar profile)

Conference Papers

Memory Oriented Optimization Approach to Reinforcement Learning on FPGA-based Embedded Systems GLSVLSI '21

Siva Satyendra Sahoo, Akhil Raj Baranwal, Salim Ullah, Akash Kumar
Proceedings of the 2021 on Great Lakes Symposium on VLSI

Reinforcement Learning (RL) represents the machine learning method that has come closest to showing human-like learning. While Deep RL is becoming increasingly popular for complex applications such as AI-based gaming, it has a high implementation cost in terms of both power and latency. Q-Learning, on the other hand, is a much simpler method that makes it more feasible for implementation on resource-constrained embedded systems for control and navigation. However, the optimal policy search in Q-Learning is a compute-intensive and inherently sequential process and a software-only implementation may not be able to satisfy the latency and throughput constraints of such applications. To this end, we propose a novel accelerator design with multiple design trade-offs for implementing Q-Learning on FPGA-based SoCs. Specifically, we analyze the various stages of the Epsilon-Greedy algorithm for RL and propose a novel microarchitecture that reduces the latency by optimizing the memory access during each iteration. Consequently, we present multiple designs that provide varying trade-offs between performance, power dissipation, and resource utilization of the accelerator. With the proposed approach, we report considerable improvement in throughput with lower resource utilization over state-of-the-art design implementations.

@inproceedings{10.1145/3453688.3461533, author = {Sahoo, Siva Satyendra and Baranwal, Akhil Raj and Ullah, Salim and Kumar, Akash}, title = {MemOReL: A \Mem\ory-Oriented \O\ptimization Approach to \Re\inforcement \L\earning on FPGA-Based Embedded Systems}, year = {2021}, isbn = {9781450383936}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3453688.3461533}, doi = {10.1145/3453688.3461533}, abstract = {Reinforcement Learning (RL) represents the machine learning method that has come closest to showing human-like learning. While Deep RL is becoming increasingly popular for complex applications such as AI-based gaming, it has a high implementation cost in terms of both power and latency. Q-Learning, on the other hand, is a much simpler method that makes it more feasible for implementation on resource-constrained embedded systems for control and navigation. However, the optimal policy search in Q-Learning is a compute-intensive and inherently sequential process and a software-only implementation may not be able to satisfy the latency and throughput constraints of such applications. To this end, we propose a novel accelerator design with multiple design trade-offs for implementing Q-Learning on FPGA-based SoCs. Specifically, we analyze the various stages of the Epsilon-Greedy algorithm for RL and propose a novel microarchitecture that reduces the latency by optimizing the memory access during each iteration. Consequently, we present multiple designs that provide varying trade-offs between performance, power dissipation, and resource utilization of the accelerator. With the proposed approach, we report considerable improvement in throughput with lower resource utilization over state-of-the-art design implementations.}, booktitle = {Proceedings of the 2021 on Great Lakes Symposium on VLSI}, pages = {339–346}, numpages = {8}, keywords = {energy-efficient computing, hardware accelerators, memory-centric computing, fpga, high-level synthesis}, location = {Virtual Event, USA}, series = {GLSVLSI '21} }

Journal Articles

ReLAccS: A Multilevel Approach to Accelerator Design for Reinforcement Learning on FPGA-Based Systems IEEE TCAD

Akhil Raj Baranwal, Salim Ullah, Siva Satyendra Sahoo, Akash Kumar
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 9, pp. 1754-1767, Sept. 2021

Reinforcement learning (RL), specifically Q-learning, with human-like learning abilities to learn from experience without any a priori data, is being increasingly used in embedded systems in the field of control and navigation. However, finding the optimal policy in this approach can be highly compute-intensive, and a software-only implementation may not satisfy the application's timing constraints. To this end, we propose optimization methods at multiple levels of accelerator design for RL. Specifically, at the architecture-level, we exploit the instruction-level parallelism and the spatial parallelism in FPGAs to improve the throughput over state-of-the-art designs by up to 34%. Further, we propose lookup table-level optimizations to reduce the resource utilization and power dissipation of the accelerator. Finally, we propose algorithm-level approximation that can be used for acceleration of Q-learning problems with more states and for reducing the peak power dissipation. We report up to 10× reduction in power dissipation with marginal degradation in quality of results

@ARTICLE{9211770, author={Baranwal, Akhil Raj and Ullah, Salim and Sahoo, Siva Satyendra and Kumar, Akash}, journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems}, title={ReLAccS: A Multilevel Approach to Accelerator Design for Reinforcement Learning on FPGA-Based Systems}, year={2021}, volume={40}, number={9}, pages={1754-1767}, doi={10.1109/TCAD.2020.3028350} }

Development of Completely Automated Poly Potential Portable Potentiostat ECS JST

Akhil Raj Baranwal1, Sohan Dudala, Prakash Rewatkar, Jaligam Murali Mohan, Mary Salve and Sanket Goel
ECS Journal of Solid State Science and Technology, Volume 10, Number 2 (Feb 2021)

Various research activities related to profiling chemicals employ detection or measurement of the response from a specimen in terms of electric fields or currents. Hence, a portable poly-potential device forms one of the necessary measuring equipment essential to these domains. This work aims to propose a Poly-Potential Portable Potentiostat (P4), that can perform electrochemical analysis of solutions through easily integrable data-acquisition hardware and flexible software post-processing. The P4 device is based on a commercial development board, which provides an analog front-end (AFE) for working with 2-lead and 3-lead amperometric cells. An economical and portable design approach is prioritised while keeping the basic functions of the research-grade commercial instruments. A novel technique of dynamically changing the bias and reference potential is used to achieve a finer resolution, enabling qualitative estimation. P4 works by performing detailed mathematical post-processing on-board and therefore relies on hardware integrity as much as on software flexibility. Calibration of P4 was done using a standardised solution to function independently of any external hardware or software tools. P4 makes electrochemical analysis truly portable in remote or resource-constrained applications.

@article{Baranwal_2021, doi = {10.1149/2162-8777/abdc15}, url = {https://dx.doi.org/10.1149/2162-8777/abdc15}, year = {2021}, month = {feb}, publisher = {IOP Publishing}, volume = {10}, number = {2}, pages = {027001}, author = {Akhil Raj Baranwal and Sohan Dudala and Prakash Rewatkar and Jaligam Murali Mohan and Mary Salve and Sanket Goel}, title = {Development of Completely Automated Poly Potential Portable Potentiostat}, journal = {ECS Journal of Solid State Science and Technology}, abstract = {Various research activities related to profiling chemicals employ detection or measurement of the response from a specimen in terms of electric fields or currents. Hence, a portable poly-potential device forms one of the necessary measuring equipment essential to these domains. This work aims to propose a Poly-Potential Portable Potentiostat (P4), that can perform electrochemical analysis of solutions through easily integrable data-acquisition hardware and flexible software post-processing. The P4 device is based on a commercial development board, which provides an analog front-end (AFE) for working with 2-lead and 3-lead amperometric cells. An economical and portable design approach is prioritised while keeping the basic functions of the research-grade commercial instruments. A novel technique of dynamically changing the bias and reference potential is used to achieve a finer resolution, enabling qualitative estimation. P4 works by performing detailed mathematical post-processing on-board and therefore relies on hardware integrity as much as on software flexibility. Calibration of P4 was done using a standardised solution to function independently of any external hardware or software tools. P4 makes electrochemical analysis truly portable in remote or resource-constrained applications.} }