Title: Neighborhood Attention: Reducing the O(n^2) complexity of Attention at the threadblock level
Ali Hassani
Ph.D. Student in Computer Science
School of Interactive Computing
Georgia Institute of Technology
Date & Time: Thursday 7/31/2025 12:00 PM - 2:00 PM Eastern Time
Location: Coda C1103 Lindberg + Zoom Meeting
Committee:
Dr. Humphrey Shi (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. Kartik Goyal - School of Interactive Computing, Georgia Institute of Technology
Dr. Judy Hoffman - School of Interactive Computing, Georgia Institute of Technology
Dr. Wen-mei Hwu - Electrical & Computer Engineering, University of Illinois at Urbana-Champaign.
Abstract:
Attention is at the heart of most foundational AI models, across tasks and modalities. In many of those cases, it incurs a significant amount of computation, which is quadratic in complexity, and often
cited as one of its greatest limitations. As a result, many approaches have been proposed to alleviate this issue, with one of the most common approaches being masked or reduced attention span.
In this work, we revisit sliding window approaches, which were commonly believed to be inherently inefficient, and we propose a new framework called Neighborhood Attention (NA). Through it, we solve design flaws in the original sliding window attention works, attempt to implement the approach efficiently for modern hardware accelerators, specifically GPUs, and conduct experiments that highlight the strengths and weaknesses of these approaches.
At the same time, we bridge the parameterization and properties of Convolution and Attention, by showing that NA exhibits inductive biases and receptive fields similar to that in convolutions, while still capable of capturing inter-dependencies, both short and long range, similar to attention.
We then show the necessity for and challenges that arise from infrastructure, especially in the context of modern implementations such as Flash Attention, and develop even more efficient and performance-optimized implementations for NA. Through these implementations, we achieve orders of magnitude improvement over naive implementations, and up to 2X improvement in inference and 1.4X improvement in training time.
We finally show the limitations of the existing methodology, and outline research topics that can address them. All of our work is open sourced through the NATTEN project.
Meeting Details:
Join Zoom Meeting
https://gatech.zoom.us/j/99422563124?pwd=kSII1Cab0ooku6rpPtf2hR5Uoylb9O.1
Meeting ID: 994 2256 3124
Passcode: 435466