- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Spring 2025 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University / Cornell Tech - High School Programming Workshop and Contest 2025
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
A Vision-Language-Action Flow Model for General Robot Control
Abstract: Robot learning has the potential to unlock flexible, general, and dexterous systems while addressing key AI challenges. However, achieving the generality needed for real-world applications faces obstacles like data, generalization, and robustness. This talk will describe the journey in building our flagship model, Pi_0 [1]. We propose a novel flow-matching architecture built on a pre-trained vision-language model to leverage Internet-scale semantic knowledge. The model is trained on diverse datasets from various dexterous robots, including single-arm, dual-arm, and mobile manipulators. We evaluate its zero-shot performance, ability to follow language instructions, and capacity to learn new skills through fine-tuning across tasks like laundry folding, table cleaning, and box assembly.
Bio: Quan Vuong is a co-founder at Physical Intelligence. His research focuses on generalist robotics and algorithms that enable intelligent behaviors through large scale learning. His works have been featured in popular news outlets, such as the New York Times and TechCrunch. He received his Ph.D. in Computer Science from the University of California San Diego.