You are cordially invited to my thesis defense scheduled on the 21st of November. 

 

Title:  Leveraging 3D information for controllable and interpretable image synthesis

 

Date: Mon​Nov 21st 2022

 

Time: 10:00 - 11:30 AM (EST)

Meeting Link:

https://gatech.zoom.us/j/99260310440 

Join our Cloud HD Video Meeting

Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a publicly traded company headquartered in San Jose, CA.

gatech.zoom.us

 

 

Amit Raj

Machine Learning PhD Student

School of Electrical and Computer Engineering

Georgia Institute of Technology

 

Committee

  1. James Hays (Advisor) , College of Computing, Georgia Tech
  2. Frank Dellaert, College of Computing, Georgia Tech
  3. Zsolt Kira,  College of Computing, Georgia Tech
  4. Dhruv Batra, College of Computing, Georgia Tech
  5. Jia-Bin Huang, Department of Computer Science, University of Maryland, College Park

 

Abstract:

Neural image synthesis has seen enormous advances in recent years, led by innovations in GANs which generate high-resolution, photo-realistic images. However, a major limitation of these methods is that they tend to capture texture statistics of an image with no explicit understanding of geometry. Additionally, GAN-only pipelines are notoriously hard to train. In contrast, recent trends in neural and volumetric rendering have demonstrated compelling results by incorporating 3D information into the synthesis pipeline using classical rendering techniques.

We leverage ideas from both classical graphics rendering and neural image synthesis to design 3D guided image generation pipelines that are photo-realistic, controllable, and easy to train. In this thesis, we discuss three sets of models that incorporate geometric information for controllable image synthesis. 

1. Static geometries: We leverage class specific shape priors to present generative models that allow for 3D consistent novel view synthesis. To that end, we propose the first framework that allows for generalization of implicit representations to novel identities in the context of facial avatars.

2. Articulated Geometries: In the second section, we extend controllable synthesis to articulated geometries. We present two frameworks (with explicit and implicit geometric representations) for synthesis of pose and viewpoint controllable full body digital avatars. 

3. Scenes:   In the final section we present a framework for generation of driving scenes with both static and dynamic elements. In particular, the proposed model allows fine grained control over local elements of the scene without needing to resynthesize the entire scene, which we posit should reduce both the memory footprint of the model and inference times.