CS Colloquium - See, Hear, Move: Towards Embodied Visual Perception

Abstract:
Computer vision has seen major success in learning to recognize objects from massive “disembodied” Web photo collections labeled by human annotators. Yet cognitive science tells us that perception develops in the context of acting and moving in the world---and without intensive supervision. Meanwhile, many realistic vision tasks require not only categorizing a well-composed human-taken photo, but also intelligently deciding where to look in the first place. In the context of these challenges, we are exploring ways to learn visual representations from unlabeled video accompanied by multi-modal sensory data like egomotion and sound. Moving from passively captured video to agents that control their own first-person cameras, we investigate how agents can learn to move intelligently to acquire visual observations. We present reinforcement learning approaches for active and exploratory look-around behavior, which show promising results for transferring policies to novel perception tasks and unseen environments.

Bio:
Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist at Facebook AI Research. Her research in computer vision and machine learning focuses on visual recognition and search. Before joining UT Austin in 2007, she received her Ph.D. at MIT in computer science. She is a Sloan Fellow, a recipient of NSF CAREER and ONR Young Investigator awards, the 2013 PAMI Young Researcher Award, the 2013 IJCAI Computers and Thought Award, a Presidential Early Career Award for Scientists and Engineers (PECASE), a 2017 Helmholtz Prize computer vision “test of time” award, and the 2018 J.K. Aggarwal Prize from the International Association for Pattern Recognition. She and her collaborators were recognized with the CVPR Best Student Paper Award in 2008 for their work on hashing algorithms for large-scale image retrieval, the Marr Prize at ICCV in 2011 for their work on modeling relative visual attributes, and the ACCV Best Application Paper Award in 2016 for their work on automatic cinematography for 360 degree video. Grauman has given recent conference keynotes at AAAI 2017, MICCAI 2018, and ICLR 2018. She previously served as Program Chair of the Conference on Computer Vision and Pattern Recognition (CVPR) in 2015 and currently serves as Associate Editor-in-Chief for the journal Pattern Analysis and Machine Intelligence (PAMI) and as a Program Chair of Neural Information Processing Systems (NIPS) 2018.