Tristan Thrush (@tristanthrush) 's Twitter Profile
Tristan Thrush

@tristanthrush

PhD-ing @StanfordAILab @stanfordnlp. Interested in data, multimodality, scaling, and many more things.

ID: 1388198782924255232

linkhttp://www.tristanthrush.com calendar_today30-04-2021 18:29:34

570 Tweet

3,3K Followers

892 Following

Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

.Stanford NLP Group will be in @Singapore with lots of ICLR 2026 papers. Tristan Thrush, Christopher Potts & Tatsunori Hashimoto will show how to select good pretraining data: LLM losses on texts correlate with downstream benchmarks, so select high-correlation docs. arxiv.org/abs/2409.05816

Ken Liu (@kenziyuliu) 's Twitter Profile Photo

An LLM generates an article verbatim—did it “train on” the article? It’s complicated: under n-gram definitions of train-set inclusion, LLMs can complete “unseen” texts—both after data deletion and adding “gibberish” data. Our results impact unlearning, MIAs & data transparency🧵

An LLM generates an article verbatim—did it “train on” the article?

It’s complicated: under n-gram definitions of train-set inclusion, LLMs can complete “unseen” texts—both after data deletion and adding “gibberish” data. Our results impact unlearning, MIAs & data transparency🧵
Karan Dalal (@karansdalal) 's Twitter Profile Photo

Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by

Ian Magnusson (@ianmagnusson) 's Twitter Profile Photo

🔭 Science relies on shared artifacts collected for the common good. 🛰 So we asked: what's missing in open language modeling? 🪐 DataDecide 🌌 charts the cosmos of pretraining—across scales and corpora—at a resolution beyond any public suite of models that has come before.

Julie Kallini ✨ @ ICLR 2025 ✈️ (@juliekallini) 's Twitter Profile Photo

🚀 In T-minus 1 week, I’ll be at ICLR presenting MrT5! The final version has tons of updates: - New controller algorithm for targeted compression rates - More baselines and downstream tasks - Scaled-up experiments to 1.23B parameter models And now, MrT5 is on 🤗HuggingFace! 🧵

🚀 In T-minus 1 week, I’ll be at ICLR presenting MrT5!

The final version has tons of updates:
- New controller algorithm for targeted compression rates
- More baselines and downstream tasks
- Scaled-up experiments to 1.23B parameter models

And now, MrT5 is on 🤗HuggingFace! 🧵
Jihyeon Je (@jihyeonje) 's Twitter Profile Photo

Can you rotate a dice 🎲 in your head? Mental imagery plays a key role in perspective reasoning for humans - but can it help VLMs reason spatially? We show that Abstract Perspective Change significantly improves VLM reasoning from unseen views. Check out our preprint for more:

Keshigeyan Chandrasegaran (@keshigeyan) 's Twitter Profile Photo

1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 grafting.stanford.edu Co-led with Michael Poli

CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.