Title: Improving Large-Scale Foundation Models Via Attention-Aware Techniques

 

Date: Thursday, August 1st, 2024

Time: 2:00 pm - 3:00 pm ET

Location: Virtually via Zoom (https://gatech.zoom.us/j/9960405372?pwd=bzhIbVdWRkxweW9naUh0aUt4ci9WZz09)

 

PhD Student:

Zhongzhi Yu, School of Computer Science, Georgia Institute of Technology

 

Committee Members:

Dr. Yingyan (Celine) Lin (Advisor) – School of Computer Science, Georgia Institute of Technology

Dr. Chao Zhang – School of Computational Science and Engineering, Georgia Institute of Technology

Dr. Judy Hoffman – School of Interactive Computing, Georgia Institute of Technology

Dr. Pavlo Molchanov – Nvidia Corporation

 

Abstract:

Foundation models, which are a series of large-scale transformer models, have shown impressive performance across a diverse range of applications, from natural language processing to computer vision. The key enabler behind their success is the attention module, which controls how these models extract relationships among input tokens. However, despite the importance of the attention module, our understanding of its role during the inference and fine-tuning stages remains limited, leading to challenges such as potentially sub-optimal model performance and a lack of interpretability.

 

My thesis research focuses on understanding the potentially suboptimal attention distributions generated by foundation models and developing attention-aware techniques to improve their performance. The primary insight from my research is that certain high-attention tokens can negatively affect foundation model performance during both fine-tuning and inference. Building on this insight, my research presents state-of-the-art solutions to enhance the performance of foundation models, including an attention-aware data augmentation technique that enhances the data efficiency of the fine-tuning process and an attention calibration technique that improves inference accuracy.