Torch2Chip: An End-to-end Customizable AI Model Compression and Deployment Toolkit for Prototype Hardware Accelerator Design

Title: Torch2Chip: An End-to-end Customizable AI Model Compression and Deployment Toolkit for Prototype Hardware Accelerator Design

Abstract: AI model compression techniques like quantization and pruning have been widely explored for vision and language tasks, driven by the growth of AI hardware accelerators such as ASICs and FPGAs. While these methods aim to accelerate computations on low-power devices, current hardware-algorithm co-design faces challenges. Modern frameworks (e.g., PyTorch) only support fixed 8-bit precision, limiting flexibility. Many quantization methods also produce discretized floating-point values rather than low-precision integers, complicating hardware deployment. Existing compression toolkits remain limited to proprietary solutions, restricting customization for prototype hardware designs. To address this, we introduce Torch2Chip, an open-source, customizable, high-performance toolkit that supports user-defined compression algorithms, automatic model fusion, and parameter extraction. Torch2Chip delivers deployment-ready formats for a range of AI models and supports both supervised and advanced lightweight self-supervised learning.

Bio: Jae-sun Seo is an Associate Professor at the ECE department at Cornell Tech since July 2023, before which he was an Associate/Assistant Professor at Arizona State University since 2014. He was a visiting researcher at Meta Reality Labs from 2022 to 2023, and he worked at IBM T. J. Watson Research Center from 2010 to 2013. His research interests include efficient ASIC and FPGA hardware design of machine learning algorithms and neuromorphic computing. Dr. Seo was a recipient of the 2012 IBM Outstanding Technical Achievement Award, 2017 NSF CAREER Award, 2020 Intel Outstanding Researcher Award, and 2022 IEEE TVLSI Best Paper Award. He has served on the technical program committees for ISSCC, ISCA, MLSys, DAC, DATE, etc.