[AI Seminar] Talk: Learning deep representations for perception, reasoning, and decision-making

Fri Feb 23 00:28:36 EST 2018

Learning deep representations for perception, reasoning, and decision-making
Honglak Lee
Monday, Feb 26, 10:00am, NSH 3305

Abstract:
Over the recent years, deep learning has emerged as a powerful method for 
learning feature representations from complex input data, and it has been 
greatly successful in many domains, such as computer vision, speech 
recognition, and language processing. While many deep learning algorithms 
focus on standard discriminative tasks with explicit supervision (e.g., 
classification), we aim for learning deep representations with less 
supervision that can serve as explanatory models of data and allow for 
richer inference, reasoning, and control.

First, I will present techniques for learning deep representations in 
weakly-supervised settings that disentangle underlying factors of 
variation. These learned representations model intricate interaction 
between underlying factors of variations in the data (e.g., pose and 
morphology for 3d objects in images) and allows for hypothetical 
reasoning and improved control (e.g., grasping of objects). I will also 
talk about learning via analogy-making and its connection to 
disentangling. In the second part of the talk, I will describe my work on 
learning deep representations from multiple heterogeneous input 
modalities. Specifically, I will present a multimodal learning framework 
via conditional prediction that explicitly encourages cross-modal 
associations.  This framework provides a theoretical guarantee about 
learning a joint distribution and explains recent progress in deep 
architectures that interface vision and language, such as caption 
generation and conditional image synthesis. I will also describe related 
ongoing work on learning joint embedding from images and text for 
zero-shot learning. Finally, I will present ongoing work on integrating 
deep learning and reinforcement learning. Specifically, I will talk about 
how learning a predictive generative model from  sequential input data can 
be useful for reinforcement learning. I also will talk about a 
memory-based architecture that helps sequential decision making in 
a first-person view and active perception setting, as well as zero-shot 
multi-task generalization with hierarchical reinforcement learning given 
task descriptions.

Bio:
Honglak Lee is a Research Scientist at Google Brain and an Associate 
Professor of Computer Science and Engineering at the University of 
Michigan, Ann Arbor. He received his Ph.D. from Computer Science 
Department at Stanford University in 2010, advised by Prof. Andrew Ng.
His research focuses on deep learning and representation learning, which 
spans over unsupervised and supervised and semi-supervised learning, 
transfer learning, structured prediction, reinforcement learning, and 
optimization. His methods have been successfully applied to computer 
vision and other perception problems. He received best paper awards at 
ICML 2009 and CEAS 2005. He has served as area chairs of ICML, NIPS, 
ICLR, ICCV, CVPR, ECCV, AAAI, and IJCAI, as well as a guest editor of 
IEEE TPAMI Special Issue on Learning Deep Architectures, an editorial 
board member of Neural Networks, and an associate editor of IEEE TPAMI. 
He received the Google Faculty Research Award (2011), NSF CAREER Award 
(2015), and was selected as one of AI's 10 to Watch by IEEE Intelligent 
Systems (2013) and a research fellow by Alfred P. Sloan Foundation 
(2016).