
Shayne Longpre
@shayneredford
Lead the Data Provenance Initiative. PhD @MIT. 🇨🇦
Prev: @Google Brain, Apple, Stanford.
Interests: AI/ML/NLP, Data-centric AI, transparency & societal impact
ID: 3025082120
http://www.shaynelongpre.com 18-02-2015 08:27:29
2,2K Tweet
5,5K Followers
1,1K Following


Will AI agents be controlled by big tech companies? Or could they be controlled by users, safeguarding user autonomy and privacy? In a new position paper (accepted to ICML 2025), we outline the steps we need to take now to enable user-centric agents (w/Seth Lazar, Noam Kolt)🧶


🚨 Lucie-Aimée Kaffee and I are looking for a junior collaborator to research the Open Model Ecosystem! 🤖 Ideally, someone w/ AI/ML background, who can help w/ annotation pipeline + analysis. docs.google.com/forms/d/e/1FAI…


🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved? Our new survey with Cohere Labs answers this and dives deep into: - Language gap in safety research - Future priority areas Thread 👇



Happy to release the Common Pile, an 8TB, 1 Trillion Token Dataset of Public Domain and Openly Licensed Text in collaboration with EleutherAI, Vector Institute, Ai2, Hugging Face, and DPI by Shayne Longpre. We provisioned a subset of the Common Pile, consisting only of public






