Tatsunori Hashimoto (@tatsu_hashimoto) 's Twitter Profile
Tatsunori Hashimoto

@tatsu_hashimoto

Assistant Prof at Stanford CS, member of @stanfordnlp and statsml groups; Formerly at Microsoft / postdoc at Stanford CS / Stats.

ID: 1118495199863476225

linkhttps://thashim.github.io/ calendar_today17-04-2019 12:43:31

191 Tweet

7,7K Followers

197 Following

Niklas Muennighoff (@muennighoff) 's Twitter Profile Photo

DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data. We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention. 📜arxiv.org/abs/2501.19393

DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data.

We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention.

📜arxiv.org/abs/2501.19393
Zitong Yang (@zitongyang0) 's Twitter Profile Photo

It's been an exciting journey to work on S1. Answers to some questions people have asked: 1. Both s1K (huggingface.co/datasets/simpl…) and the full 59K dataset are public (huggingface.co/datasets/simpl…). We do not have manual verification of the authenticity of "ground truth" solution field,

elvis (@omarsar0) 's Twitter Profile Photo

"We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users’ prompts." Concerning if true!

"We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users’ prompts."

Concerning if true!
Anikait Singh (@anikait_singh_) 's Twitter Profile Photo

Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remains a significant challenge. Introducing FSPO, a simple framework leveraging synthetic preference data to adapt new users with meta-learning for open-ended QA! 🧵

Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remains a significant challenge. Introducing FSPO, a simple framework leveraging synthetic preference data to adapt new users with meta-learning for open-ended QA! 🧵
Chenchen Gu (@chenchenygu) 's Twitter Profile Photo

Prompt caching lowers inference costs but can leak private information from timing differences. Our audits found 7 API providers with potential leakage of user data. Caching can even leak architecture info—OpenAI's embedding model is likely a decoder-only Transformer! 🧵1/9

Prompt caching lowers inference costs but can leak private information from timing differences.

Our audits found 7 API providers with potential leakage of user data.

Caching can even leak architecture info—OpenAI's embedding model is likely a decoder-only Transformer!
🧵1/9
Martijn Bartelds (@barteldsmartijn) 's Twitter Profile Photo

🎙️ Speech recognition is great - if you speak the right language. Our new Stanford NLP Group paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%. Work w/ Ananjan, Moussa, Dan Jurafsky, Tatsunori Hashimoto and Karen Livescu. Here’s how it works! 🧵

Diyi Yang (@diyi_yang) 's Twitter Profile Photo

Just wrapped up #CS224N NLP with Deep Learning poster session with over 600 students and an amazing co-instructor Tatsunori Hashimoto 😊 Lots of exciting research and engaging discussions 🚀 Huge congratulations to all the students for their hard work, and a big thank you to our

Just wrapped up #CS224N NLP with Deep Learning poster session with over 600 students and an amazing co-instructor <a href="/tatsu_hashimoto/">Tatsunori Hashimoto</a>  😊

Lots of exciting research and engaging discussions 🚀

Huge congratulations to all the students for their hard work, and a big thank you to our
Yangjun Ruan (@yangjunr) 's Twitter Profile Photo

New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”. arxiv.org/abs/2503.18866 Here’s how it works🧵

New paper on synthetic pretraining!

We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”.
arxiv.org/abs/2503.18866

Here’s how it works🧵
Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

Want to learn the engineering details of building state-of-the-art Large Language Models (LLMs)? Not finding much info in OpenAI’s non-technical reports? Percy Liang and Tatsunori Hashimoto are here to help with CS336: Language Modeling from Scratch, now rolling out to YouTube.

Zitong Yang (@zitongyang0) 's Twitter Profile Photo

Synthetic Continued Pretraining (arxiv.org/pdf/2409.07431) has been accepted as an Oral Presentation at #ICLR2025! We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.

Synthetic Continued Pretraining (arxiv.org/pdf/2409.07431) has been accepted as an Oral Presentation at #ICLR2025!

We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.
CLS (@chengleisi) 's Twitter Profile Photo

I’ll present this paper at ICLR this Thursday (10-12:30 on 4/24; Poster at Hall 3 + Hall 2B #253). Also DM me if you wanna chat about LLM for scientific research in general, we have some exciting projects coming out soon!

Tristan Thrush (@tristanthrush) 's Twitter Profile Photo

At #ICLR, check out Perplexity Correlations: a statistical framework to select the best pretraining data with no LLM training! I can’t make the trip, but Tatsunori Hashimoto will present the poster for us! Continue reading for the latest empirical validations of PPL Correlations:

Nicole Meister (@nicole__meister) 's Twitter Profile Photo

I'm at #NAACL2025 presenting our paper on Distributional Alignment of LLMs (oral talk Wednesday @ 4pm at the R&E.2: Resources and Evaluation Session). Excited to chat!

Simon Guo 🦝 (@simonguozirui) 's Twitter Profile Photo

Designed some graphics for Stanford CS336 (Language Modeling from Scratch) by Percy Liang Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi Covering four assignments 📚 that teach you how to 🧑‍🍳 cook an LLM from scratch: - Build and Train a Tokenizer 🔤 - Write Triton kernels for

Designed some graphics for Stanford CS336 (Language Modeling from Scratch) by <a href="/percyliang/">Percy Liang</a> <a href="/tatsu_hashimoto/">Tatsunori Hashimoto</a> <a href="/marcelroed/">Marcel Rød</a>
<a href="/neilbband/">Neil Band</a> <a href="/rckpudi/">Rohith Kuditipudi</a>

Covering four assignments 📚 that teach you how to 🧑‍🍳 cook an LLM from scratch:
- Build and Train a Tokenizer 🔤
- Write Triton kernels for
Ryan Marten (@ryanmart3n) 's Twitter Profile Photo

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals.

We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
Percy Liang (@percyliang) 's Twitter Profile Photo

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.