Title: Composable Algorithms for Scalable GNN Training
Date: Thursday, April 16, 2026
Time: 11:00 AM -- 1:00 PM EST
Location: Coda C1315 Grant Park
Virtual Meeting: Zoom
Meeting ID: 979 4766 5601
Passcode: 609215
Muhammed Fatih Balın
CS Ph.D. Candidate
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology
Committee
Dr. Ümit V. Çatalyürek (Advisor) -- School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Yunan Luo -- School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Bo Dai -- School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Pan Li -- School of Electrical and Computer Engineering, Georgia Institute of Technology
Dr. Yingyan (Celine) Lin -- School of Computer Science, Georgia Institute of Technology
Abstract
Training Graph Neural Networks (GNN) at a large scale requires significant computational resources.
One of the most effective ways to reduce resource requirements is minibatch training coupled with
graph sampling. Existing methods encounter a significant bottleneck to scaling due to the Neighborhood
Explosion Phenomenon (NEP). This thesis focuses on designing composable algorithms aimed at mitigating
the effects of NEP and presents a GNN training framework incorporating optimized implementations
of these algorithms.
First, we present a novel sampling algorithm LAyer-neighBOR (LABOR), which is designed to be a
direct replacement for Neighbor Sampling (NS) with the same fan-out hyperparameter while sampling up
to 7x fewer vertices, without sacrificing quality. LABOR is designed so that the variance of the
estimator for each vertex is aligned with NS in the context of an individual vertex. Moreover, under
the same vertex sampling budget constraints, LABOR converges faster than existing layer sampling
approaches and can use up to 112x larger batch size compared to NS.
Second, we present Cooperative Minibatching. We leverage the observation that the size of the sampled
subgraph is a sublinear and concave function of the batch size, leading to noteworthy reductions in
the workload executed per seed vertex as the batch sizes increase. Hence, it is favorable for processors
equipped with a fast interconnect to work on a large minibatch together instead of working on separate
smaller minibatches. We also show how to take advantage of the same phenomenon in serial execution
by proposing Dependent Consecutive Minibatches. Our experimental evaluations demonstrate that
increasing this dependency results in up to a fourfold reduction in bandwidth requirements for fetching
vertex embeddings, without negatively impacting model convergence.
Third, we present GraphBolt, an end-to-end GPU-accelerated data loading framework for large-scale
GNN training. GraphBolt integrates LABOR sampling and Cooperative Minibatching within a composable
pipeline built on PyTorch DataPipes, alongside system-level optimizations including software pipelining
via asynchronous futures, dynamic multi-level feature caching across GPU, CPU, and SSD tiers,
incremental graph caching, and GPU-accelerated sampling kernels with edge-parallel load balancing.
GraphBolt requires no preprocessing for its caches and supports heterogeneous graphs, temporal graphs,
node classification, and link prediction through shared, composable implementations.
All of the presented algorithms above are composable, meaning that the savings increase
in a multiplicative manner as we combine these approaches. GraphBolt demonstrates this by combining
all of these techniques in a unified framework with optimized implementations, achieving significantly
higher training throughput than existing frameworks such as DGL and PyG on large-scale GNN training
workloads.