- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Spring 2025 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University / Cornell Tech - High School Programming Workshop and Contest 2025
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Robotics Ph. D. prgram
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Title
"Intelligent Systems that Perceive, Imagine, and Act Like Humans by Aligning Representations from Multimodal Data"
Abstract
The machine learning community has embraced specialized models tailored to specific data domains. However, relying solely on a singular data type might constrain flexibility and generality, requiring additional labeled data and hindering user interaction. To address these challenges, my research objective is to build efficient intelligent systems that learn from the perception of the physical world and their interactions with humans to execute diverse and complex tasks to assist people. These systems should support seamless interactions with humans and computers in digital software environments and tangible real-world contexts by aligning representations from multimodal data. In this talk, I will elaborate on my approaches across three dimensions: perception, imagination, and action, encompassing methods like recognition, generation, and robotics. These findings effectively mitigate the constraints of existing model setups, opening avenues for multimodal representations to unify a myriad of signals within a singular, comprehensive model.
Bio
Boyi Li is a postdoctoral scholar at UC Berkeley, advised by Prof. Jitendra Malik and Prof. Trevor Darrell. She is also a researcher at NVIDIA Research. She received her Ph.D. at Cornell University, advised by Prof. Serge Belongie and Prof. Kilian Q. Weinberger. Her research interest is in computer vision and machine learning. Her research primarily focuses on developing generalizable algorithms and interactive intelligent systems by aligning representations from multimodal data, such as 2D pixels, 3D geometry, language, audio, touch, and smell, among others.