Datacentric Multi-Acceleration at Scale | Department of Computer Science

Title: Datacentric Multi-Acceleration at Scale

Abstract: So far, we have relied on technology scaling, system scale-out, and specialization within the single domain of deep neural networks to power planet-scale applications. With the rise of generative AI and compound AI systems, end-to-end AI-powered applications increasingly span multiple domains, and GPU-accelerated systems will no longer be sufficient to meet the compute demands of next-generation, planet-scale GenAI applications.
To unleash the next wave of compute, we must move toward multi-acceleration. However, multi-acceleration will quickly become limited by the data delivery between accelerators. In this talk, I present the vision of data-centric multi-acceleration and how we aim to realize it by focusing on memory and data delivery specialization. I conclude by briefly introducing two of our recent works accepted to ASPLOS 2025 and ISCA 2025 that use specialized, compute-enabled memories to accelerate retrieval-augmented generation.

Bio: Mohammad Alian is an Assistant Professor at the Electrical and Computer Engineering department at Cornell. His team is developing new technologies that challenge the conventional separation of tasks between the data delivery hierarchy (memory, storage, and network) and compute, aiming to build next-generation data centers. His research has been recognized with four Best Paper nominations, an Honorable Mention from IEEE MICRO, a Samsung Open Innovation runner-up award, and an NSF CAREER Award.

Video Recording