- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Spring 2025 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University / Cornell Tech - High School Programming Workshop and Contest 2025
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Robotics Ph. D. prgram
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Title: Datacentric Multi-Acceleration at Scale
Abstract: So far, we have relied on technology scaling, system scale-out, and specialization within the single domain of deep neural networks to power planet-scale applications. With the rise of generative AI and compound AI systems, end-to-end AI-powered applications increasingly span multiple domains, and GPU-accelerated systems will no longer be sufficient to meet the compute demands of next-generation, planet-scale GenAI applications.
To unleash the next wave of compute, we must move toward multi-acceleration. However, multi-acceleration will quickly become limited by the data delivery between accelerators. In this talk, I present the vision of data-centric multi-acceleration and how we aim to realize it by focusing on memory and data delivery specialization. I conclude by briefly introducing two of our recent works accepted to ASPLOS 2025 and ISCA 2025 that use specialized, compute-enabled memories to accelerate retrieval-augmented generation.
Bio: Mohammad Alian is an Assistant Professor at the Electrical and Computer Engineering department at Cornell. His team is developing new technologies that challenge the conventional separation of tasks between the data delivery hierarchy (memory, storage, and network) and compute, aiming to build next-generation data centers. His research has been recognized with four Best Paper nominations, an Honorable Mention from IEEE MICRO, a Samsung Open Innovation runner-up award, and an NSF CAREER Award.