Title: Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change​

 

Date: Monday, January 30, 2023

Time: 1pm - 3pm EST

Location: Coda C1115 Druid Hills 

Virtual Link: Microsoft Teams

Virtual Meeting ID: 253 542 548 123

Passcode: H9RZyU

 

 

Jonathan Balloch

Robotics Ph.D. Student

School of Interactive Computing
Georgia Institute of Technology

 

Committee

Dr. Mark Riedl (Advisor), School of Interactive Computing, Georgia Tech

Dr. Sehoon Ha, School of Interactive Computing, Georgia Tech

Dr. Seth Hutchinson, School of Interactive Computing, Georgia Tech

Dr. Michael Littman, Computer Science Department, Brown University

Dr. Harish Ravichandar, School of Interactive Computing, Georgia Tech

 

Abstract

Techniques for learning policies that can solve sequential decision-making problems have a wide applicability in the real world, from conversational AI to disaster response robots. However, applying learning techniques can be difficult because the open world is vast and varying. Like many techniques for solving sequential decision-making problems, most reinforcement learning methods assume that the world is a closed, fixed process. This exemplifies the problem of online task transfer in reinforcement learning (RL), or the process of adapting a policy online to a shift in an agent's environment. Solutions to online task transfer are important and necessary for agents to operate in the presence of open world novelties–events that regularly transform real world environments.

 

This thesis aims to shed light on and advance reinforcement learning solutions to the online task transfer problem through reuse of prior knowledge and directed exploration solutions. First, I define this problem in the context of conventional reinforcement learning and present the NovGrid environment I developed for use in studying online task transfer. Second, I contribute the WorldCloner method, which uses neurosymbolic RL for efficient novelty adaptation and demonstrates how careful reuse of prior knowledge can improve the efficiency transfer. Third, I present the entropy-based exploration method for the RL world model, MaxEnt World Explorer, and use this to demonstrate how transfer-focused directed exploration can improve transfer efficiency without sacrificing source-task performance. Lastly, I propose two additional contributions to continue developing reuse of prior knowledge and directed exploration solutions. Specifically, to further develop exploration for transfer, I will develop a time-dependent uncertainty-based exploration strategy that will be more sensitive stale environment information and as a result changes in the environment. To further develop reuse of prior knowledge, I propose a discrete latent representation of environment composition and dynamics that is grounded in natural language to avoid unnecessary changes to learned latent representations during adaptation.