Title: Towards Fine-grained Multi-Attribute Control Using Language Models
Date: Wednesday, 17th April, 2024
Time: 2:30 PM to 4:15 PM ET (11:30 AM - 1:15 PM PT)
Location: Virtual Zoom Link
Meeting ID: 951 6076 2750
Passcode: 299490
Ashutosh Baheti
Computer Science Ph.D. Candidate
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Committee:
Prof. Mark Riedl (Advisor) -- School of Interactive Computing, Georgia Institute of Technology
Prof. Alan Ritter (Co-Advisor) -- School of Interactive Computing, Georgia Institute of Technology
Prof. Dhruv Batra -- School of Interactive Computing, Georgia Institute of Technology
Prof. Munmun de Choudhury -- School of Interactive Computing, Georgia Institute of Technology
Prof. Maarten Sap -- Language Technologies Institute, Carnegie Mellon University
Abstract
As we increasingly rely on powerful language models, ensuring their safe and effective operation necessitates extensive research in controllable text generation. Existing state-of-the-art language models struggle to generate the most accurate or desired output at the first attempt. Inspired by recent developments in self-correction in large language models and new reinforcement learning methods, we aim to train smaller language models as fine-grained editors, whereby they iteratively edit outputs to satisfy threshold constraints over multiple classifier-based attributes.
In this thesis, I show a study of contextual offensive behavior of pretrained large language models and curate a high-quality dataset for toxicity detection. Next, I introduce a novel offline RL algorithm that can utilize arbitrary numeric scores as rewards during training to optimize any user-desired LM behavior by filtering out suboptimal data. Finally, I design an offline RL framework, I propose a fine-grained multi-attribute controllability task, where the goal is to guide the language model to generate output sequences that satisfy user-defined threshold-based attribute constraints. The LM model can take multiple edits to reach the desired attributes. Experiments on both languages and proteins demonstrate the versatility and effectiveness of our approach.