Sunny Qin (@sunnytqin) Twitter Tweets • TwiCopy

Sunny Qin

@sunnytqin

+ Follow

Machine Learning PhD @ Harvard

ID: 1622819742523305984

calendar_today07-02-2023 04:49:26

9 Tweet

49 Followers

87 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Transformer LMs get pretty far by acting like ngram models, so why do they even learn syntax? A new paper by Sunny Qin, me, and David Alvarez Melis uncovers the keys to grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation.

Transformer LMs get pretty far by acting like ngram models, so why do they even learn syntax? A new paper by <a href="/sunnytqin/">Sunny Qin</a>, me, and <a href="/elmelis/">David Alvarez Melis</a> uncovers the keys to grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation.

thumb_up_off_alt146

chat_bubble_outline1

repeat32

shareShare

Naomi Saphra hiring a lab 🧈🪰

@nsaphra

5 months ago

Ever looked at LLM skill emergence and thought 70B parameters was a magic number? Our new paper shows sudden breakthroughs are samples from bimodal performance distributions across seeds. Observed accuracy jumps abruptly while the underlying accuracy DISTRIBUTION changes slowly!

thumb_up_off_alt260

chat_bubble_outline6

repeat26

shareShare

David Alvarez Melis

@elmelis

4 months ago

🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs. It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔

thumb_up_off_alt26

chat_bubble_outline1

repeat10

shareShare

Core Francisco Park

@corefpark

3 months ago

🚨 New Paper! A lot happens in the world every day—how can we update LLMs with belief-changing news? We introduce a new dataset "New News" and systematically study knowledge integration via System-2 Fine-Tuning (Sys2-FT). 1/n

thumb_up_off_alt249

chat_bubble_outline9

repeat37

shareShare