Tatsunori Hashimoto (@tatsu_hashimoto) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data. We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention. 📜arxiv.org/abs/2501.19393

thumb_up_off_alt990

chat_bubble_outline41

repeat188

shareShare

Zitong Yang

@zitongyang0

6 months ago

It's been an exciting journey to work on S1. Answers to some questions people have asked: 1. Both s1K (huggingface.co/datasets/simpl…) and the full 59K dataset are public (huggingface.co/datasets/simpl…). We do not have manual verification of the authenticity of "ground truth" solution field,

thumb_up_off_alt83

chat_bubble_outline1

repeat10

shareShare

elvis

@omarsar0

5 months ago

"We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users’ prompts." Concerning if true!

thumb_up_off_alt193

chat_bubble_outline8

repeat39

shareShare

Anikait Singh

@anikait_singh_

5 months ago

Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remains a significant challenge. Introducing FSPO, a simple framework leveraging synthetic preference data to adapt new users with meta-learning for open-ended QA! 🧵

thumb_up_off_alt133

chat_bubble_outline1

repeat11

shareShare

Chenchen Gu

@chenchenygu

5 months ago

Prompt caching lowers inference costs but can leak private information from timing differences. Our audits found 7 API providers with potential leakage of user data. Caching can even leak architecture info—OpenAI's embedding model is likely a decoder-only Transformer! 🧵1/9

thumb_up_off_alt121

chat_bubble_outline4

repeat34

shareShare

Martijn Bartelds

@barteldsmartijn

4 months ago

🎙️ Speech recognition is great - if you speak the right language. Our new Stanford NLP Group paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%. Work w/ Ananjan, Moussa, Dan Jurafsky, Tatsunori Hashimoto and Karen Livescu. Here’s how it works! 🧵

thumb_up_off_alt48

chat_bubble_outline4

repeat18

shareShare

Diyi Yang

@diyi_yang

4 months ago

Just wrapped up #CS224N NLP with Deep Learning poster session with over 600 students and an amazing co-instructor Tatsunori Hashimoto 😊 Lots of exciting research and engaging discussions 🚀 Huge congratulations to all the students for their hard work, and a big thank you to our

Just wrapped up #CS224N NLP with Deep Learning poster session with over 600 students and an amazing co-instructor <a href="/tatsu_hashimoto/">Tatsunori Hashimoto</a> 😊

Lots of exciting research and engaging discussions 🚀

Huge congratulations to all the students for their hard work, and a big thank you to our

thumb_up_off_alt170

chat_bubble_outline4

repeat9

shareShare

Yangjun Ruan

@yangjunr

4 months ago

New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”. arxiv.org/abs/2503.18866 Here’s how it works🧵

thumb_up_off_alt479

chat_bubble_outline14

repeat100

shareShare

Stanford NLP Group

@stanfordnlp

3 months ago

Want to learn the engineering details of building state-of-the-art Large Language Models (LLMs)? Not finding much info in OpenAI’s non-technical reports? Percy Liang and Tatsunori Hashimoto are here to help with CS336: Language Modeling from Scratch, now rolling out to YouTube.

thumb_up_off_alt1,1K

chat_bubble_outline10

repeat156

shareShare

wh

@nrehiew_

3 months ago

This looks really good and pretty surprised to see a college class this ~up to date with frontier stuff

thumb_up_off_alt269

chat_bubble_outline3

repeat25

shareShare

Zitong Yang

@zitongyang0

3 months ago

Synthetic Continued Pretraining (arxiv.org/pdf/2409.07431) has been accepted as an Oral Presentation at #ICLR2025! We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.

thumb_up_off_alt82

chat_bubble_outline1

repeat12

shareShare

CLS

@chengleisi

3 months ago

I’ll present this paper at ICLR this Thursday (10-12:30 on 4/24; Poster at Hall 3 + Hall 2B #253). Also DM me if you wanna chat about LLM for scientific research in general, we have some exciting projects coming out soon!

thumb_up_off_alt150

chat_bubble_outline2

repeat55

shareShare

Tristan Thrush

@tristanthrush

3 months ago

At #ICLR, check out Perplexity Correlations: a statistical framework to select the best pretraining data with no LLM training! I can’t make the trip, but Tatsunori Hashimoto will present the poster for us! Continue reading for the latest empirical validations of PPL Correlations:

thumb_up_off_alt51

chat_bubble_outline1

repeat13

shareShare

Nicole Meister

@nicole__meister

3 months ago

I'm at #NAACL2025 presenting our paper on Distributional Alignment of LLMs (oral talk Wednesday @ 4pm at the R&E.2: Resources and Evaluation Session). Excited to chat!

thumb_up_off_alt48

chat_bubble_outline0

repeat7

shareShare

Simon Guo 🦝

@simonguozirui

2 months ago

Designed some graphics for Stanford CS336 (Language Modeling from Scratch) by Percy Liang Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi Covering four assignments 📚 that teach you how to 🧑‍🍳 cook an LLM from scratch: - Build and Train a Tokenizer 🔤 - Write Triton kernels for

Designed some graphics for Stanford CS336 (Language Modeling from Scratch) by <a href="/percyliang/">Percy Liang</a> <a href="/tatsu_hashimoto/">Tatsunori Hashimoto</a> <a href="/marcelroed/">Marcel Rød</a>
<a href="/neilbband/">Neil Band</a> <a href="/rckpudi/">Rohith Kuditipudi</a>

Covering four assignments 📚 that teach you how to 🧑‍🍳 cook an LLM from scratch:
- Build and Train a Tokenizer 🔤
- Write Triton kernels for

thumb_up_off_alt625

chat_bubble_outline9

repeat57

shareShare

Ryan Marten

@ryanmart3n

2 months ago

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

thumb_up_off_alt880

chat_bubble_outline27

repeat181

shareShare

Percy Liang

@percyliang

a month ago

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

thumb_up_off_alt3,3K

chat_bubble_outline31

repeat323

shareShare

CLS

@chengleisi

25 days ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

thumb_up_off_alt553

chat_bubble_outline10

repeat162

shareShare

Tatsunori Hashimoto

Gate.io

Niklas Muennighoff

Zitong Yang

elvis

Anikait Singh

Chenchen Gu

Martijn Bartelds

Diyi Yang

Yangjun Ruan

Stanford NLP Group

wh

Zitong Yang

CLS

Tristan Thrush

Nicole Meister

Simon Guo 🦝

Ryan Marten

Percy Liang

CLS