- Diversity, Equity, and Inclusion
- Research News
- Department Life
- Oral History of Cornell CS
- CS 40th Anniversary Booklet
- ABC Book for Computer Science at Cornell by David Gries
- Department Timeline
- Job Postings
- Ithaca Info
- Internal info
- Graduation Information
- Cornell Tech Colloquium
- Student Colloquium
- Spring 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2023 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop
This talk considers privacy-preserving machine learning from the perspective of cryptography and from the perspective of statistical privacy.
The first part of the talk considers the use of secure multi-party computation (MPC) in machine learning. Secure MPC allows parties to perform computations on data while keeping that data private. This enables training of machine-learning models on private data sets owned by different parties, evaluation of one party’s private model using another party’s private data, etc. Despite its potential, adoption of secure MPC in machine learning is hampered by the absence of flexible software frameworks that “speak the language” of machine-learning researchers and engineers. To fill this gap, we present the CrypTen framework that exposes popular secure MPC primitives via abstractions that are common in modern machine-learning frameworks, such as tensor computations, automatic differentiation, and modular neural networks. CrypTen’s GPU support and high-performance communication primitives enable efficient and secure training and evaluation of state-of-the-art models for text classification, speech recognition, and image classification under a semi-honest threat model.
The second part of the talk considers the problem that information about the training data of machine-learning models can leak either through the model parameters or through predictions made by the model. Consequently, when the training data contains sensitive attributes, assessing the amount of information leakage is paramount. The talk presents a method to quantify this leakage using the Fisher information of the model about the data. Unlike the worst-case a priori guarantees of differential privacy, Fisher information loss measures leakage with respect to specific examples, attributes, or sub-populations within the dataset. We motivate Fisher information loss through the Cramér-Rao bound and delineate the implied threat model. We provide efficient methods to compute Fisher information loss for output-perturbed generalized linear models and show how these methods can be extended to private training of deep-learning models.
The talk presents joint work with Awni Hannun, Chuan Guo, Brian Knott, Shobha Venkataraman, Mark Ibrahim, and Shubho Sengupta.
Laurens van der Maaten is a Distinguished Research Scientist at Meta AI. He pioneered the development of web-scale weakly supervised training of visual-recognition models at Meta and was the lead developer of the CrypTen framework for privacy-preserving machine learning. He also led Meta’s FAIR Accel team, which produced AI research breakthroughs including the Cicero Diplomacy bot and the ESM protein language models. Prior to joining Meta, he co-invented the t-SNE dimension reduction technique together with Geoffrey Hinton. At present, Laurens is working on developing the next generation of foundational generative AI models. His research has won several awards, including Best Paper Awards at CVPR 2017 and UAI 2021.