Benchmarking matrix multiplication on the GPU
Project with Lloyd Elliott
Matrix multiplication is a core component of statistical learning, appearing in linear regression and neural networks. The asymptotic complexity of typical implementations is the cubic power. However, Strassen's algorithm and recent advances have reduced the complexity to around the 2.37-th power. Some of these advances are known as "galactic algorithms" as they achieve sub-cubic performance only for unreasonably large N. Typical implementations also make use of GPUs, with CUDA being the dominant package used to interface with hardware. In this project, we will implement sub-cubic algorithms for matrix multiplication in the high level shader language (hlsl). We will determine if typical implementations of matrix multiplication are still preferred, and make some extrapolations about regimes in which sub-cubic performance might yield better walltime. We will implement a proof-of-concept drop in replacement for CUDA's matrix multiplication, and target a broader range of graphics cards, beyond the NVIDIA cards that are typical for CUDA.