Shayne Longpre (@shayneredford) Twitter Tweets • TwiCopy

Shayne Longpre

@shayneredford

+ Follow

Lead the Data Provenance Initiative. PhD @MIT. 🇨🇦
Prev: @Google Brain, Apple, Stanford.
Interests: AI/ML/NLP, Data-centric AI, transparency & societal impact

ID: 3025082120

linkhttp://www.shaynelongpre.com calendar_today18-02-2015 08:27:29

2,2K Tweet

5,5K Followers

1,1K Following

Diyi Yang

@diyi_yang

6 months ago

🚀 Introducing CAVA: The Comprehensive Assessment for Voice Assistants A new benchmark for evaluating end-to-end, speech-in-speech-out voice assistants in real-world scenarios. We go beyond single tasks or metrics to test the capabilities required for voice assistants:

thumb_up_off_alt174

chat_bubble_outline4

repeat32

shareShare

Sayash Kapoor

@sayashk

6 months ago

Will AI agents be controlled by big tech companies? Or could they be controlled by users, safeguarding user autonomy and privacy? In a new position paper (accepted to ICML 2025), we outline the steps we need to take now to enable user-centric agents (w/Seth Lazar, Noam Kolt)🧶

thumb_up_off_alt166

chat_bubble_outline8

repeat55

shareShare

Shayne Longpre

@shayneredford

6 months ago

🚨 Lucie-Aimée Kaffee and I are looking for a junior collaborator to research the Open Model Ecosystem! 🤖 Ideally, someone w/ AI/ML background, who can help w/ annotation pipeline + analysis. docs.google.com/forms/d/e/1FAI…

thumb_up_off_alt98

chat_bubble_outline4

repeat23

shareShare

rishi

@rishibommasani

5 months ago

My PhD defense is this coming Monday (June 2) from 1-2 PM PT. It will be in-person at Stanford and also on Zoom. I tried my best to invite folks individually, but if you would like an invite, just send me an email or DM me and I can send you details!

thumb_up_off_alt111

chat_bubble_outline10

repeat2

shareShare

Yong Zheng-Xin (Yong)

@yong_zhengxin

5 months ago

🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved? Our new survey with Cohere Labs answers this and dives deep into: - Language gap in safety research - Future priority areas Thread 👇

thumb_up_off_alt59

chat_bubble_outline4

repeat29

shareShare

EleutherAI

@aieleuther

5 months ago

Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2

thumb_up_off_alt556

chat_bubble_outline10

repeat127

shareShare

Enrico Shippole

@enricoshippole

5 months ago

Happy to release the Common Pile, an 8TB, 1 Trillion Token Dataset of Public Domain and Openly Licensed Text in collaboration with EleutherAI, Vector Institute, Ai2, Hugging Face, and DPI by Shayne Longpre. We provisioned a subset of the Common Pile, consisting only of public

Happy to release the Common Pile, an 8TB, 1 Trillion Token Dataset of Public Domain and Openly Licensed Text in collaboration with <a href="/AiEleuther/">EleutherAI</a>, <a href="/VectorInst/">Vector Institute</a>, <a href="/allen_ai/">Ai2</a>, <a href="/huggingface/">Hugging Face</a>, and DPI by <a href="/ShayneRedford/">Shayne Longpre</a>. We provisioned a subset of the Common Pile, consisting only of public

thumb_up_off_alt158

chat_bubble_outline4

repeat35

shareShare

Luca Soldaini ✈️ ICLR 25

@soldni

5 months ago

Only a fraction of data needed for LLM comes with identifiable licenses. But if you curate it all, can you train a model on in? We release Common Pile, a 1T tokens dataset, and train a 7B model on it! results are on par with open weights models trained on eq FLOPS

thumb_up_off_alt72

chat_bubble_outline2

repeat6

shareShare

Stella Biderman

@blancheminerva

5 months ago

Two years in the making, we finally have 8 TB of openly licensed data with document-level metadata for authorship attribution, licensing details, links to original copies, and more. Hugely proud of the entire team.

thumb_up_off_alt551

chat_bubble_outline18

repeat64

shareShare

Kush Tiwary

@ktiwary2

5 months ago

🧵 Crazy verifier we have built: Simulate vision evolution by evolving embodied agents inside realistic simulators that simulate physics of light and use it as a verification engine. This enables us to re-run evolution computationally to test impossible questions like 'what if

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Niloofar (on faculty job market!)

@niloofar_mire

5 months ago

🪄We made a 1B Llama BEAT GPT-4o by... making it MORE private?! LoCoMo results: 🔓GPT-4o: 80.6% 🔐1B Llama + GPT-4o (privacy): 87.7% (+7.1!⏫) 💡How? GPT-4o provides reasoning ("If X then Y"), the local model fills in the blanks with your private data to get the answer!

thumb_up_off_alt174

chat_bubble_outline6

repeat37

shareShare

jessica dai

@jessicadai_

4 months ago

individual reporting for post-deployment evals — a little manifesto (& new preprints!) tldr: end users have unique insights about how deployed systems are failing; we should figure out how to translate their experiences into formal evaluations of those systems.

thumb_up_off_alt126

chat_bubble_outline7

repeat27

shareShare

rishi

@rishibommasani

4 months ago

My PhD materials are now available! Dissertation: arxiv.org/abs/2506.23123 Slides: drive.google.com/file/d/13N2FRW… Folks should read the acknowledgements since so many people have been so important to me along this journey!

thumb_up_off_alt256

chat_bubble_outline14

repeat26

shareShare