On the Untapped Potential of 3D (Foundation) Models

Title: On the Untapped Potential of 3D (Foundation) Models

Abstract: The recent wave of generative AI has led to unprecedented success in various fields, such as natural language processing (NLP) and 2D computer vision. In comparison, advancements in 3D understanding and generation, while still noteworthy, have been relatively modest. For instance, most 3D generative models focus primarily on objects rather than scenes, and existing multimodal large lanugage models (LLMs) still struggle with spatial understanding.
In this talk, I will argue that unlocking the full potential of 3D foundation models hinges on sourcing the right data and developing a principled approach to evaluation. Specifically, I will advocate for the use of 360-degree videos, as opposed to traditional videos, as the preferred data source for next-generation 3D models. I will demonstrate how we extract scalable and diverse data from these videos, enabling, for the first time, the synthesis of large-scale real-world 3D scenes and the reconstruction of their geometry from a single image. Next, I will present our recent efforts in evaluating 3D generative models and multimodal LLMs, discussing how these findings inform the future design of 3D models. Finally, I will showcase how we distill knowledge from multimodal LLMs into existing 3D systems, making them interactable, actionable, and thus suitable for physical intelligence.

Bio: Wei-Chiu Ma is an Assistant Professor at Cornell University. His research lies in the intersection of computer vision and robotics, with a focus on in-the-wild 3D modeling and simulation and their applications to autonomous systems. Wei-Chiu is a recipient of the Siebel Scholarship and was selected as a rising star in Cyber Physical Systems. His work has been covered by media outlets such as WIRED, DeepLearning.AI, MIT News, etc. Previously, Wei-Chiu was a Sr. Research Scientist at UberATG and Waabi, where he served as the technical lead of the sensor simulation team. His contribution to autonomy and simulation have led to 15+ patents. He received his Ph.D. in EECS from MIT and his M.S. in Robotics from CMU.