Title: Scalable and Efficient Graph Learning for Dynamic, Heterogeneous, and Knowledge-Augmented Graphs
Date: Thursday, April 3, 2025
Time: 11:00 AM – 12:00 PM ET
Location (Hybrid):
- Coda C0903 Ansley
- Microsoft Teams (Meeting ID: 262 709 547 619, Passcode: rK6s9UP7)
Mingyu Guan
CS Ph.D. Student
School of Computer Science
College of Computing
Georgia Institute of Technology
Committee:
- Dr. Taesoo Kim (advisor) – School of Cybersecurity and Privacy, Georgia Institute of Technology
- Dr. Anand Iyer (co-advisor) - School of Computer Science, Georgia Institute of Technology
- Dr. Ada Gavrilovska - School of Computer Science, Georgia Institute of Technology
- Dr. Kexin Rong - School of Computer Science, Georgia Institute of Technology
- Dr. Jay Stokes – Microsoft Research
Abstract:
Graphs are a fundamental component of modern machine learning for structured data, driving advancements in areas such as recommendation systems, fraud detection, and traffic prediction. However, real-world graphs are often dynamic, heterogeneous, and knowledge-rich, presenting significant challenges in scalability, efficiency, and adaptability. This thesis addresses these challenges through innovations in dynamic graph learning, heterogeneous graph modeling, and retrieval-augmented generation with knowledge graphs. First, this thesis introduces ReD, a system designed for efficient and scalable training of Dynamic Graph Neural Networks (DGNNs). By reusing intermediate results, incrementally computing aggregations across graph snapshots, and eliminating communication overhead in distributed training, ReD enables DGNNs to scale to massive dynamic graphs while achieving up to an order-of-magnitude speedup over existing frameworks. Second, this thesis presents HetTree, a novel approach for scalable and expressive Heterogeneous Graph Neural Networks (HGNNs). Unlike existing methods that treat metapaths independently, HetTree models their hierarchical relationships using a semantic tree structure and enhances representation learning with a subtree attention mechanism. This approach significantly improves both efficiency and predictive performance across large-scale heterogeneous graphs.
Building upon these foundations, this thesis explores Graph-Based Retrieval-Augmented Generation (RAG) for Large Language Models (LLMs), addressing a fundamental limitation of current RAGs—their reliance on unstructured text data and vector similarity matching. Conventional RAGs often fail to capture complex connections among entities across large corpus and struggle to generate comprehensive and contextually relevant responses. By leveraging graph structures to enhance retrieval quality and infusing global knowledge into LLM-based generation, this work aims to improve response quality, coherence, and diversity in knowledge-augmented AI systems. Together, these contributions provide novel methodologies that harness graph structures to tackle dynamic, heterogeneous, and knowledge-intensive tasks, laying the groundwork for more scalable, efficient, and adaptive AI systems.