Title: Optimizing Sparsity in Distributed Machine Learning Training
Date: Wednesday, August 14th, 2024
Time: 10:00 AM - 11:30 AM EST
Location: KACB 3126
Virtual: https://gatech.zoom.us/j/95651978956?pwd=cNa98ds2wGysW5SlajJDBAkouSiAvH.1
Cheng Wan
School of Computer Science
College of Computing
Georgia Institute of Technology
Committee Members:
Dr. Yingyan (Celine) Lin (Advisor) – School of Computer Science, Georgia Institute of Technology
Dr. Pan Li – School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Alexey Tumanov – School of Computer Science, Georgia Institute of Technology
Dr. Anand Iyer – School of Computer Science, Georgia Institute of Technology
Abstract:
As machine learning models and datasets continue to grow, distributed machine learning has become essential for meeting the computational demands of large-scale training. While recent frameworks have improved scalability and throughput, the optimization of sparsity within distributed deep neural networks remains under-explored. Sparsity, however, poses a significant challenge in advanced models such as graph neural networks and mixture-of-experts models.
This thesis proposal systematically investigates three types of sparsity in distributed machine learning training: sparse data access, sparse operations, and sparse workflows. We introduce both system-level and algorithm-level innovations to address inefficiencies caused by these types of sparsity. Through theoretical insights and practical implementations, we demonstrate how these optimizations can significantly reduce communication overhead and enhance scalability, thereby improving the efficiency and performance of distributed machine learning systems. Our work advances the understanding of sparsity optimization and lays the groundwork for developing more efficient distributed training architectures in the future.