Shayne Longpre (@shayneredford) 's Twitter Profile
Shayne Longpre

@shayneredford

Lead the Data Provenance Initiative. PhD @MIT. 🇨🇦
Prev: @Google Brain, Apple, Stanford.
Interests: AI/ML/NLP, Data-centric AI, transparency & societal impact

ID: 3025082120

linkhttp://www.shaynelongpre.com calendar_today18-02-2015 08:27:29

2,2K Tweet

5,5K Followers

1,1K Following

Diyi Yang (@diyi_yang) 's Twitter Profile Photo

🚀 Introducing CAVA: The Comprehensive Assessment for Voice Assistants A new benchmark for evaluating end-to-end, speech-in-speech-out voice assistants in real-world scenarios. We go beyond single tasks or metrics to test the capabilities required for voice assistants:

🚀 Introducing CAVA: The Comprehensive Assessment for Voice Assistants

A new benchmark for evaluating end-to-end, speech-in-speech-out voice assistants in real-world scenarios.

We go beyond single tasks or metrics to test the capabilities required for voice assistants:
Sayash Kapoor (@sayashk) 's Twitter Profile Photo

Will AI agents be controlled by big tech companies? Or could they be controlled by users, safeguarding user autonomy and privacy? In a new position paper (accepted to ICML 2025), we outline the steps we need to take now to enable user-centric agents (w/Seth Lazar, Noam Kolt)🧶

Will AI agents be controlled by big tech companies? Or could they be controlled by users, safeguarding user autonomy and privacy?

In a new position paper (accepted to ICML 2025), we outline the steps we need to take now to enable user-centric agents (w/<a href="/sethlazar/">Seth Lazar</a>, Noam Kolt)🧶
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

🚨 Lucie-Aimée Kaffee and I are looking for a junior collaborator to research the Open Model Ecosystem! 🤖 Ideally, someone w/ AI/ML background, who can help w/ annotation pipeline + analysis. docs.google.com/forms/d/e/1FAI…

rishi (@rishibommasani) 's Twitter Profile Photo

My PhD defense is this coming Monday (June 2) from 1-2 PM PT. It will be in-person at Stanford and also on Zoom. I tried my best to invite folks individually, but if you would like an invite, just send me an email or DM me and I can send you details!

Yong Zheng-Xin (Yong) (@yong_zhengxin) 's Twitter Profile Photo

🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved? Our new survey with Cohere Labs answers this and dives deep into: - Language gap in safety research - Future priority areas Thread 👇

🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved?

Our new survey with <a href="/Cohere_Labs/">Cohere Labs</a> answers this and dives deep into:
- Language gap in safety research
- Future priority areas

Thread 👇
EleutherAI (@aieleuther) 's Twitter Profile Photo

Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2

Can you train a performant language models without using unlicensed text?

We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&amp;2
Enrico Shippole (@enricoshippole) 's Twitter Profile Photo

Happy to release the Common Pile, an 8TB, 1 Trillion Token Dataset of Public Domain and Openly Licensed Text in collaboration with EleutherAI, Vector Institute, Ai2, Hugging Face, and DPI by Shayne Longpre. We provisioned a subset of the Common Pile, consisting only of public

Happy to release the Common Pile, an 8TB, 1 Trillion Token Dataset of Public Domain and Openly Licensed Text in collaboration with <a href="/AiEleuther/">EleutherAI</a>, <a href="/VectorInst/">Vector Institute</a>, <a href="/allen_ai/">Ai2</a>, <a href="/huggingface/">Hugging Face</a>, and DPI by <a href="/ShayneRedford/">Shayne Longpre</a>. We provisioned a subset of the Common Pile, consisting only of public
Luca Soldaini ✈️ ICLR 25 (@soldni) 's Twitter Profile Photo

Only a fraction of data needed for LLM comes with identifiable licenses. But if you curate it all, can you train a model on in? We release Common Pile, a 1T tokens dataset, and train a 7B model on it! results are on par with open weights models trained on eq FLOPS

Stella Biderman (@blancheminerva) 's Twitter Profile Photo

Two years in the making, we finally have 8 TB of openly licensed data with document-level metadata for authorship attribution, licensing details, links to original copies, and more. Hugely proud of the entire team.

Kush Tiwary (@ktiwary2) 's Twitter Profile Photo

🧵 Crazy verifier we have built: Simulate vision evolution by evolving embodied agents inside realistic simulators that simulate physics of light and use it as a verification engine. This enables us to re-run evolution computationally to test impossible questions like 'what if

Niloofar (on faculty job market!) (@niloofar_mire) 's Twitter Profile Photo

🪄We made a 1B Llama BEAT GPT-4o by... making it MORE private?! LoCoMo results: 🔓GPT-4o: 80.6% 🔐1B Llama + GPT-4o (privacy): 87.7% (+7.1!⏫) 💡How? GPT-4o provides reasoning ("If X then Y"), the local model fills in the blanks with your private data to get the answer!

🪄We made a 1B Llama BEAT GPT-4o by... making it MORE private?!

LoCoMo results:
🔓GPT-4o: 80.6% 
🔐1B Llama + GPT-4o (privacy): 87.7% (+7.1!⏫)

💡How? GPT-4o provides reasoning ("If X then Y"), the local model fills in the blanks with your private data to get the answer!
jessica dai (@jessicadai_) 's Twitter Profile Photo

individual reporting for post-deployment evals — a little manifesto (& new preprints!) tldr: end users have unique insights about how deployed systems are failing; we should figure out how to translate their experiences into formal evaluations of those systems.

individual reporting for post-deployment evals — a little manifesto (&amp; new preprints!)

tldr: end users have unique insights about how deployed systems are failing; we should figure out how to translate their experiences into formal evaluations of those systems.
rishi (@rishibommasani) 's Twitter Profile Photo

My PhD materials are now available! Dissertation: arxiv.org/abs/2506.23123 Slides: drive.google.com/file/d/13N2FRW… Folks should read the acknowledgements since so many people have been so important to me along this journey!

My PhD materials are now available!

Dissertation: arxiv.org/abs/2506.23123

Slides: drive.google.com/file/d/13N2FRW…

Folks should read the acknowledgements since so many people have been so important to me along this journey!