Udaya Ghai (@udayaghai) 's Twitter Profile
Udaya Ghai

@udayaghai

Applied Scientist @Amazon, PhD in Machine Learning @PrincetonCS

ID: 1524085579918491649

linkhttps://udayaghai.com calendar_today10-05-2022 17:55:25

28 Tweet

123 Followers

572 Following

Xinyi Chen (@xinyichen2) 's Twitter Profile Photo

Together with Elad Hazan, Cong, Andrea Zanette, and Nati, we are organizing a long program on reinforcement learning and control at Instit. for Mathematical & Statistical Innovation! Join us for workshops on frontiers of online/offline RL, control, multi-agent RL, and opportunities to present your research.

Sandeep Chinchali (@spchinchali) 's Twitter Profile Photo

Excited to share our new paper in INTERSPEECH '24 on embedding signatures in audio to detect and prevent #deepfake audio. Read our paper here: arxiv.org/abs/2407.00913.

Dibya Ghosh (@its_dibya) 's Twitter Profile Photo

With R1, a lot of people have been asking “how come we didn't discover this 2 years ago?” Well... 2 years ago, I spent 6 months working exactly on this (PG / PPO for math+gsm8k), but my results were nowhere as good. Here’s my take on what blocked me and what’s changed: 🧵

Tengyu Ma (@tengyuma) 's Twitter Profile Photo

RL + CoT works great for DeepSeek-R1 & o1, but:  1️⃣ Linear-in-log scaling in train & test-time compute 2️⃣ Likely bounded by difficulty of training problems Meet STP—a self-play algorithm that conjectures & proves indefinitely, scaling better! 🧠⚡🧵🧵 arxiv.org/abs/2502.00212

RL + CoT works great for DeepSeek-R1 & o1, but: 

1️⃣ Linear-in-log scaling in train & test-time compute
2️⃣ Likely bounded by difficulty of training problems

Meet STP—a self-play algorithm that conjectures & proves indefinitely, scaling better! 🧠⚡🧵🧵

arxiv.org/abs/2502.00212
Eugene Vinitsky 🍒🦋 (@eugenevinitsky) 's Twitter Profile Photo

We've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data. It is SOTA on every planning benchmark we tried. In self-play, it goes 20 years between collisions.

We've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data. 
It is SOTA on every planning benchmark we tried.
In self-play, it goes 20 years between collisions.
Sham Kakade (@shamkakade6) 's Twitter Profile Photo

1/n In new work, we draw connections between accelerated SGD and various recent optimizers including AdEMAMix, Schedule-Free optimizers and MARS, and use it to design ‘Simplified-AdEMAMix’ which matches performance of AdEMAMix without any extra momentum buffer.

1/n In new work, we draw connections between accelerated SGD and various recent optimizers including AdEMAMix, Schedule-Free optimizers and MARS, and use it to design ‘Simplified-AdEMAMix’ which matches performance of AdEMAMix without any extra momentum buffer.
Elad Hazan (@hazanprinceton) 's Twitter Profile Photo

V excited about a breakthrough in learning linear dynamical systems and sequence prediction, w. my brilliant postdoc, first time to learn marginally-stable (long memory) asymmetric linear dynamical systems without dependence on the hidden dimension!! arxiv.org/abs/2502.06545

Raj Pabari (@realrajpabari) 's Twitter Profile Photo

1/ I'm excited to share a project I worked on at Amazon where we introduce "A shared-revenue Bertrand game" -- a variant of the Bertrand duopoly in which the firms' payoffs are coupled, generating novel Nash equilibria that capture real-world revenue-sharing dynamics.

1/ I'm excited to share a project I worked on at Amazon where we introduce "A shared-revenue Bertrand game" -- a variant of the Bertrand duopoly in which the firms' payoffs are coupled, generating novel Nash equilibria that capture real-world revenue-sharing dynamics.
Max Rudolph (@maxbrudolph) 's Twitter Profile Photo

We ran thousands of sweeps to compare RL algos for imperfect information games and found preliminary evidence for the Policy Gradient Hypothesis: With proper tuning, generic PG (PPO, etc.) methods are highly competitive in IIGS. Check out the full paper: arxiv.org/abs/2502.08938

Akhil Bagaria (@akhil_bagaria) 's Twitter Profile Photo

My 1st last-author paper (joint) will be presented as an Oral at AAAI ! A lot of Option Discovery work (including mine) is based on intuitive heuristics. Instead, we formalize what we want options to do for the agent and then derive algorithms with provable guarantees.

Nikunj Saunshi (@nsaunshi) 's Twitter Profile Photo

*New ICLR paper* – We introduce a paradigm of *looped models for reasoning*. Main claims - Reasoning requires depth (via looping), not necessarily params - LLM reasoning predictably scales with more loops - Looped models generate “latent thoughts” & can simulate CoT reasoning 1/n

*New ICLR paper* – We introduce a paradigm of *looped models for reasoning*. Main claims
- Reasoning requires depth (via looping), not necessarily params
- LLM reasoning predictably scales with more loops
- Looped models generate “latent thoughts” & can simulate CoT reasoning
1/n
Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Akshay Krishnamurthy and Audrey Huang (Audrey Huang) have written a nice blog post on the intersection of reinforcement learning theory and language model post-training. let-all.com/blog/2025/03/0…

Yuda Song @ ICLR 2025 (@yus167) 's Twitter Profile Photo

Heading to #ICLR2025 🇸🇬! Excited to connect with friends and chat about RL: theory, LLM reasoning and robotics! I will present our Oral paper on LLM self-improvement📍4:18pm Sat. Join me if you want to learn about its scaling laws, iterative training and test-time improvement.

Heading to #ICLR2025 🇸🇬! Excited to connect with friends and chat about RL: theory, LLM reasoning and robotics! 

I will present our Oral paper on LLM self-improvement📍4:18pm Sat. Join me if you want to learn about its scaling laws, iterative training and test-time improvement.
Abhishek Panigrahi (@abhishek_034) 's Twitter Profile Photo

🎉Excited to present 2 papers at #ICLR2025 in Singapore! 🧠 Progressive distillation induces an implicit curriculum 📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218 🖼️ Poster: Sat, 10:00am–12:30pm (#632) ⚙️ Efficient stagewise pretraining via progressive subnetworks 🖼️ Poster:

🎉Excited to present 2 papers at #ICLR2025 in Singapore!

🧠 Progressive distillation induces an implicit curriculum  
📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218  
🖼️ Poster: Sat, 10:00am–12:30pm (#632)

⚙️ Efficient stagewise pretraining via progressive subnetworks  
🖼️ Poster:
Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

Hanlin Zhang John J. Vastola Marinka Zitnik Naomi Saphra hiring PhD students 🧈🪰 Zechen Zhang 4/25 at 10am: 'How Does Critical Batch Size Scale in Pre-training?' Hanlin Zhang · Depen Morwani · Nikhil Vyas · Jingfeng Wu · Difan Zou · Udaya Ghai · Dean Foster · Sham Kakade Submission: openreview.net/forum?id=JCiF0…

<a href="/_hanlin_zhang_/">Hanlin Zhang</a> <a href="/johnjvastola/">John J. Vastola</a> <a href="/marinkazitnik/">Marinka Zitnik</a> <a href="/nsaphra/">Naomi Saphra hiring PhD students 🧈🪰</a> <a href="/ZechenZhang5/">Zechen Zhang</a> 4/25 at 10am:

'How Does Critical Batch Size Scale in Pre-training?'

<a href="/_hanlin_zhang_/">Hanlin Zhang</a>  · Depen Morwani · Nikhil Vyas · Jingfeng Wu · Difan Zou · Udaya Ghai · Dean Foster · Sham Kakade

Submission: openreview.net/forum?id=JCiF0…
Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

4/26 at 10am: 'Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models' Yuda Song · Hanlin Zhang · Carson Eisenach · Sham Kakade · Dean Foster · Udaya Ghai Submission: openreview.net/forum?id=mtJSM…

4/26 at 10am:

'Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models'

<a href="/yus167/">Yuda Song</a> · <a href="/_hanlin_zhang_/">Hanlin Zhang</a>  · Carson Eisenach · Sham Kakade · Dean Foster · <a href="/udayaghai/">Udaya Ghai</a> 

Submission: openreview.net/forum?id=mtJSM…
Nimit Kalra (@qw3rtman) 's Twitter Profile Photo

Still noodling on this, but the generation-verification gap proposed by Yuda Song Hanlin Zhang Sham Kakade Udaya Ghai et al. in arxiv.org/abs/2412.02674 is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning

Nimit Kalra (@qw3rtman) 's Twitter Profile Photo

Discussing "Mind the Gap" tonight at Haize Labs's NYC AI Reading Group with Leonard Tang and will brown. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with

Discussing "Mind the Gap" tonight at <a href="/haizelabs/">Haize Labs</a>'s NYC AI Reading Group with <a href="/leonardtang_/">Leonard Tang</a> and <a href="/willccbb/">will brown</a>. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with