Title: TOWARDS FINE-GRAINED MULTI-ATTRIBUTE CONTROL USING LANGUAGE MODELS
Date: Friday, 21st July, 2023
Time: 11:30 AM to 1:30 PM ET (8:30 AM - 10:30 AM PT)
Location: Virtual | Zoom link - https://gatech.zoom.us/j/93320367440?pwd=REhiamxVREcwdUF5Z21XTXJ1NmFWUT09&from=addon
Ashutosh Baheti
PhD student
School of Interactive Computing
Georgia Institute of Technology
Committee:
Prof. Mark Riedl (Advisor) -- School of Interactive Computing, Georgia Institute of Technology
Prof. Alan Ritter (Co-Advisor) -- School of Interactive Computing, Georgia Institute of Technology
Prof. Dhruv Batra -- School of Interactive Computing, Georgia Institute of Technology
Prof. Munmun de Choudhury -- School of Interactive Computing, Georgia Institute of Technology
Prof. Maarten Sap -- Language Technologies Institute, Carnegie Mellon University
Abstract
Recent advancements in pretraining large language models have resulted in their remarkable ability to generate complex and human-proficient language. Consequently, these models have gained widespread adoption as complex problem-solving chatbots and writing assistants. However, as we increasingly rely on these powerful language models, ensuring their safe and effective operation necessitates extensive research in controllable text generation. Existing methods manipulate the decoding process, use data augmentation or online reinforcement learning methods to encourage models to generate responses with the desired attributes. However, even the state-of-the-art language models struggle to generate the most accurate or desired output at the first attempt. Inspired by recent developments in self-correction in large language models and new reinforcement learning methods, we aim to train smaller language models as fine-grained editors, whereby they iteratively edit outputs to satisfy threshold constraints over multiple classifier-based attributes.
In this thesis, I show preliminary work to incorporate per-token distributional constraints during decoding and improve the generation quality of traditional LSTM-based dialog models. Later, I show a study of contextual offensive behavior of pretrained large language models and curate a high-quality dataset for toxicity detection. We also experiment with preliminary controlled text generation methods to decrease the dialog model's toxicity and agreement in offensive contexts. Next, I introduce a novel offline RL algorithm that can utilize arbitrary numeric scores as rewards during training to optimize any user-desired LM behavior. Building on this offline RL framework, I propose a fine-grained multi-attribute controllability task, where the goal is to guide the language model to generate output sequences that satisfy user-defined threshold-based attribute constraints. We frame the problem as an editing game, where the language model can take multiple edits to reach the desired attributes. Interestingly, our method uses Offline RL to cheaply train LM editors without any exploration.