About

I am a research scientist with the Machine Learning Research group at Apple. I work on dataset design. I have been fortunate to be a part of the DataComp, ImageNetV2, and OpenCLIP projects.

Previously, I spent 9 wonderful years at UC Berkeley and had the pleasure of working with Ben Recht, Ludwig Schmidt, Eric Jonas, Shivaram Venkataraman, and many others.

Contact me at vs at vaishaal dot com.

Selected Publications

DataComp-LM: In search of the next generation of training sets for language models
Data Filtering Networks - ICLR 2024
DataComp: In search of the next generation of multimodal datasets - Neurips 2023
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP) - ICML 2021
Do ImageNet classifiers generalize to ImageNet? - ICML 2019

Vaishaal Shankar