Title: Improving Measurement, Generalization and Reliability Under Changing Visual Distributions
Prithvijit Chattopadhyay
Ph.D. Student in Computer Science
School of Interactive Computing
Georgia Institute of Technology
(https://prithv1.xyz/)
Date: Friday September 8th, 2023
Time: 1.00-2.30pm EST
Location (in-person + virtual): CODA C1108 Brookhaven and Zoom (https://gatech.zoom.us/j/96369482550)
Committee
Dr. Judy Hoffman (advisor, School of Interactive Computing, Georgia Institute of Technology)
Dr. Dhruv Batra (School of Interactive Computing, Georgia Institute of Technology, Meta AI)
Dr. James Hays (School of Interactive Computing, Georgia Institute of Technology)
Dr. Animesh Garg (School of Interactive Computing, Georgia Institute of Technology)
Dr. Roozbeh Mottaghi (Meta AI)
Abstract
Despite their remarkable success on several computer vision tasks (classification, detection, segmentation, embodied decision-making), vision models exhibit high sensitivity to shifts in visual inputs. When tested on images significantly divergent from training data, such models tend to make erroneous and miscalibrated predictions. As these models become increasingly integrated in real-world systems, it becomes crucial to ensure their robustness, adaptability and reliability in making predictions under distribution shifts. In this thesis, we discuss progress along the following fundamental steps to be closer to accomplishing this goal:
i) Measuring Robustness – improving resistance to distribution shifts requires understanding failure modes, which in turn, necessitates accurate evaluation of trained models under diverse conditions. Instead of focusing on predominantly studied static evaluations of model robustness using curated out-of-distribution images, we examine robustness within embodied settings, where agents influence their visual inputs through actions. We specifically discuss RobustNav, a benchmark to stress-test simulation-trained embodied navigation agents under diverse visual (affecting egocentric observations) and dynamics (affecting transition dynamics) corruptions. We find that navigation performance is highly vulnerable to corruptions, often resulting in distinct behavioral irregularities.
ii) Improving Generalization – since it is infeasible to curate training data encompassing all anticipated test-time variations, it is critical to develop robustness enhancing algorithms that can effectively leverage the nature of available training time data sources. We specifically consider the Sim2Real setting, for static visual tasks, where synthetic training data serves as a cost-effective substitute for labeled real data. We propose PASTA, a method to enhance Sim2Real generalization across tasks, architectures (CNNs, ViTs), and initializations (supervised, self-supervised) by generating synthetic augmented views based on frequency domain Sim2Real differences.
iii) Improving Reliability – while maintaining performance under unseen conditions is important, in practice, it is often not the sole factor of interest. In proposed work, we study methods to encourage reliable predictions by reducing model miscalibration under distribution shifts. We study the Sim2Real setting, and note that existing adaptation methods tend to improve performance on real data at the expense of well-calibrated confidence scores, resulting in superior methods being more overconfident and prone to miscalibration during errors. We propose AUGCAL, a solution that, when combined with a Sim2Real adaptation method, preserves Sim2Real transfer performance while ensuring calibrated predictions across tasks (recognition, segmentation) and architectures (CNNs, ViTs).