Title: Scalable Asynchronous Actor-based Approaches for Distributed-Memory Parallel Applications

 

Date: Tuesday, December 3rd, 2024

Time: 9:00am - 11:00am ET

Location: KACB 1120A

 

Youssef Elmougy

School of Computer Science

College of Computing

Georgia Institute of Technology

 

Committee

 

Dr. Vivek Sarkar (Advisor) - School of Computer Science, Georgia Institute of Technology

Dr. Akihiro Hayashi - School of Computer Science, Georgia Institute of Technology

Dr. Ling Liu - School of Computer Science, Georgia Institute of Technology

Dr. Ada Gavrilovska - School of Computer Science, Georgia Institute of Technology

Dr. Richard Vuduc - School of Computational Science and Engineering, Georgia Institute of Technology

 

Abstract

 

Distributed-memory parallel applications are the backbone of large-scale data analytics and modern computing, forming foundations that enable the efficient distribution and execution of large-scale computations across heterogeneous and distributed systems. Classical primitives for data analytics, including Bulk Synchronous Parallel models, focused on moving data to compute. Although these primitives were good enough for past HPC applications, they are highly inefficient for the fine-grained asynchrony and data distribution required by modern large-scale data analytics applications, motivating the need for new architectural approaches for data analytics. Alleviating these challenges inherent in distributed computing environments, the underlying Fine-Grained Asynchronous Bulk Synchronous Parallel model in the Actor-based programming system HClib-Actor proposes moving compute to data via asynchronous active messages. This Actor-based approach presents a lightweight, asynchronous computation model that utilizes fine-grained asynchronous actor messages to express point-to-point remote operations, allowing fine-grained, distributed, asynchronous, and scalable executions across systems. We realize the efficacy of the actor-based approach through exploring multiple perspectives and facets of distributed computing, including algorithm design, runtime systems, and system-level optimizations.

 

The Algorithm perspective elucidates novel algorithms tailored for scalable, distributed applications in a wide variety of application domains. Through theoretical analysis and empirical evaluations, we demonstrate the superior scalability and efficiency in the asynchronous actor-based model compared to traditional parallel computing paradigms. The Runtime perspective delves into enhancements made to HClib-Actor, including MPI-Actors interoperability, extending termination protocols, introducing light-weight global termination schemes, and cloud environment deployment. It details optimizations incorporated into HClib-Actor, showing insights that underscore the critical role of runtime systems in facilitating scalable distributed computing. Finally, the System-Level perspective signifies the importance of aligning distributed-memory parallel applications with underlying hardware architecture characteristics for achieving peak performance by delving into hardware-specific optimizations and architectural considerations on diverse hardware platforms. We elucidate the importance of architecture-aware bindings and software-level buffer sizes on distributed computing efficiency and bandwidth. Through the realization of the three perspectives and facets of distributed computing, we offer a comprehensive scalable asynchronous actor-based approach for distributed-memory parallel applications. By integrating these insights, we provide valuable contributions to the advancement of distributed computing, forming foundations for more efficient and scalable distributed and parallel applications in diverse computing landscapes.