Sunny Qin (@sunnytqin) 's Twitter Profile
Sunny Qin

@sunnytqin

Machine Learning PhD @ Harvard

ID: 1622819742523305984

calendar_today07-02-2023 04:49:26

9 Tweet

49 Followers

87 Following

Naomi Saphra hiring a lab 🧈🪰 (@nsaphra) 's Twitter Profile Photo

Transformer LMs get pretty far by acting like ngram models, so why do they even learn syntax? A new paper by Sunny Qin, me, and David Alvarez Melis uncovers the keys to grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation.

Transformer LMs get pretty far by acting like ngram models, so why do they even learn syntax? A new paper by <a href="/sunnytqin/">Sunny Qin</a>, me, and <a href="/elmelis/">David Alvarez Melis</a> uncovers the keys to grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation.
Naomi Saphra hiring a lab 🧈🪰 (@nsaphra) 's Twitter Profile Photo

Ever looked at LLM skill emergence and thought 70B parameters was a magic number? Our new paper shows sudden breakthroughs are samples from bimodal performance distributions across seeds. Observed accuracy jumps abruptly while the underlying accuracy DISTRIBUTION changes slowly!

Ever looked at LLM skill emergence and thought 70B parameters was a magic number? Our new paper shows sudden breakthroughs are samples from bimodal performance distributions across seeds. Observed accuracy jumps abruptly while the underlying accuracy DISTRIBUTION changes slowly!
David Alvarez Melis (@elmelis) 's Twitter Profile Photo

🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs. It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔

Core Francisco Park (@corefpark) 's Twitter Profile Photo

🚨 New Paper! A lot happens in the world every day—how can we update LLMs with belief-changing news? We introduce a new dataset "New News" and systematically study knowledge integration via System-2 Fine-Tuning (Sys2-FT). 1/n

🚨 New Paper!

A lot happens in the world every day—how can we update LLMs with belief-changing news?

We introduce a new dataset "New News" and systematically study knowledge integration via System-2 Fine-Tuning (Sys2-FT).

1/n