Udaya Ghai (@udayaghai) Twitter Tweets • TwiCopy

Xinyi Chen

10 months ago

Together with Elad Hazan, Cong, Andrea Zanette, and Nati, we are organizing a long program on reinforcement learning and control at Instit. for Mathematical & Statistical Innovation! Join us for workshops on frontiers of online/offline RL, control, multi-agent RL, and opportunities to present your research.

thumb_up_off_alt70

chat_bubble_outline1

repeat6

shareShare

Sandeep Chinchali

@spchinchali

10 months ago

Excited to share our new paper in INTERSPEECH '24 on embedding signatures in audio to detect and prevent #deepfake audio. Read our paper here: arxiv.org/abs/2407.00913.

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Dibya Ghosh

@its_dibya

9 months ago

With R1, a lot of people have been asking “how come we didn't discover this 2 years ago?” Well... 2 years ago, I spent 6 months working exactly on this (PG / PPO for math+gsm8k), but my results were nowhere as good. Here’s my take on what blocked me and what’s changed: 🧵

thumb_up_off_alt1,1K

chat_bubble_outline12

repeat138

shareShare

Tengyu Ma

@tengyuma

9 months ago

RL + CoT works great for DeepSeek-R1 & o1, but: 1️⃣ Linear-in-log scaling in train & test-time compute 2️⃣ Likely bounded by difficulty of training problems Meet STP—a self-play algorithm that conjectures & proves indefinitely, scaling better! 🧠⚡🧵🧵 arxiv.org/abs/2502.00212

thumb_up_off_alt556

chat_bubble_outline17

repeat108

shareShare

Eugene Vinitsky 🍒🦋

@eugenevinitsky

9 months ago

We've built a simulated driving agent that we trained on 1.6 billion km of driving with no human data. It is SOTA on every planning benchmark we tried. In self-play, it goes 20 years between collisions.

thumb_up_off_alt884

chat_bubble_outline33

repeat95

shareShare

Sham Kakade

@shamkakade6

9 months ago

1/n In new work, we draw connections between accelerated SGD and various recent optimizers including AdEMAMix, Schedule-Free optimizers and MARS, and use it to design ‘Simplified-AdEMAMix’ which matches performance of AdEMAMix without any extra momentum buffer.

thumb_up_off_alt67

chat_bubble_outline3

repeat13

shareShare

Elad Hazan

@hazanprinceton

9 months ago

V excited about a breakthrough in learning linear dynamical systems and sequence prediction, w. my brilliant postdoc, first time to learn marginally-stable (long memory) asymmetric linear dynamical systems without dependence on the hidden dimension!! arxiv.org/abs/2502.06545

thumb_up_off_alt96

chat_bubble_outline7

repeat13

shareShare

Raj Pabari

@realrajpabari

9 months ago

1/ I'm excited to share a project I worked on at Amazon where we introduce "A shared-revenue Bertrand game" -- a variant of the Bertrand duopoly in which the firms' payoffs are coupled, generating novel Nash equilibria that capture real-world revenue-sharing dynamics.

thumb_up_off_alt25

chat_bubble_outline2

repeat4

shareShare

Max Rudolph

@maxbrudolph

9 months ago

We ran thousands of sweeps to compare RL algos for imperfect information games and found preliminary evidence for the Policy Gradient Hypothesis: With proper tuning, generic PG (PPO, etc.) methods are highly competitive in IIGS. Check out the full paper: arxiv.org/abs/2502.08938

thumb_up_off_alt30

chat_bubble_outline3

repeat4

shareShare

Akhil Bagaria

@akhil_bagaria

8 months ago

My 1st last-author paper (joint) will be presented as an Oral at AAAI ! A lot of Option Discovery work (including mine) is based on intuitive heuristics. Instead, we formalize what we want options to do for the agent and then derive algorithms with provable guarantees.

thumb_up_off_alt39

chat_bubble_outline1

repeat5

shareShare

Nikunj Saunshi

@nsaunshi

8 months ago

*New ICLR paper* – We introduce a paradigm of *looped models for reasoning*. Main claims - Reasoning requires depth (via looping), not necessarily params - LLM reasoning predictably scales with more loops - Looped models generate “latent thoughts” & can simulate CoT reasoning 1/n

thumb_up_off_alt566

chat_bubble_outline12

repeat83

shareShare

Dylan Foster 🐢

@canondetortugas

8 months ago

Akshay Krishnamurthy and Audrey Huang (Audrey Huang) have written a nice blog post on the intersection of reinforcement learning theory and language model post-training. let-all.com/blog/2025/03/0…

thumb_up_off_alt169

chat_bubble_outline3

repeat29

shareShare

Elad Hazan

@hazanprinceton

8 months ago

Very happy to share the almost-final version of our new online control book: arxiv.org/abs/2211.09619

thumb_up_off_alt292

chat_bubble_outline4

repeat50

shareShare

Yuda Song @ ICLR 2025

@yus167

6 months ago

Heading to #ICLR2025 🇸🇬! Excited to connect with friends and chat about RL: theory, LLM reasoning and robotics! I will present our Oral paper on LLM self-improvement📍4:18pm Sat. Join me if you want to learn about its scaling laws, iterative training and test-time improvement.

thumb_up_off_alt76

chat_bubble_outline1

repeat12

shareShare

Abhishek Panigrahi

@abhishek_034

6 months ago

🎉Excited to present 2 papers at #ICLR2025 in Singapore! 🧠 Progressive distillation induces an implicit curriculum 📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218 🖼️ Poster: Sat, 10:00am–12:30pm (#632) ⚙️ Efficient stagewise pretraining via progressive subnetworks 🖼️ Poster:

thumb_up_off_alt59

chat_bubble_outline2

repeat10

shareShare

Kempner Institute at Harvard University

@kempnerinst

6 months ago

Hanlin Zhang John J. Vastola Marinka Zitnik Naomi Saphra hiring PhD students 🧈🪰 Zechen Zhang 4/25 at 10am: 'How Does Critical Batch Size Scale in Pre-training?' Hanlin Zhang · Depen Morwani · Nikhil Vyas · Jingfeng Wu · Difan Zou · Udaya Ghai · Dean Foster · Sham Kakade Submission: openreview.net/forum?id=JCiF0…

<a href="/_hanlin_zhang_/">Hanlin Zhang</a> <a href="/johnjvastola/">John J. Vastola</a> <a href="/marinkazitnik/">Marinka Zitnik</a> <a href="/nsaphra/">Naomi Saphra hiring PhD students 🧈🪰</a> <a href="/ZechenZhang5/">Zechen Zhang</a> 4/25 at 10am:

'How Does Critical Batch Size Scale in Pre-training?'

<a href="/_hanlin_zhang_/">Hanlin Zhang</a> · Depen Morwani · Nikhil Vyas · Jingfeng Wu · Difan Zou · Udaya Ghai · Dean Foster · Sham Kakade

Submission: openreview.net/forum?id=JCiF0…

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

Kempner Institute at Harvard University

@kempnerinst

6 months ago

4/26 at 10am: 'Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models' Yuda Song · Hanlin Zhang · Carson Eisenach · Sham Kakade · Dean Foster · Udaya Ghai Submission: openreview.net/forum?id=mtJSM…

4/26 at 10am:

'Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models'

<a href="/yus167/">Yuda Song</a> · <a href="/_hanlin_zhang_/">Hanlin Zhang</a> · Carson Eisenach · Sham Kakade · Dean Foster · <a href="/udayaghai/">Udaya Ghai</a>

Submission: openreview.net/forum?id=mtJSM…

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Hanlin Zhang

@_hanlin_zhang_

5 months ago

Glad to see B* scaling 📈 in [ZMV+24] was reproduced by Cerebras!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Nimit Kalra

@qw3rtman

5 months ago

Still noodling on this, but the generation-verification gap proposed by Yuda Song Hanlin Zhang Sham Kakade Udaya Ghai et al. in arxiv.org/abs/2412.02674 is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning

thumb_up_off_alt21

chat_bubble_outline1

repeat2

shareShare

Nimit Kalra

@qw3rtman

4 months ago

Discussing "Mind the Gap" tonight at Haize Labs's NYC AI Reading Group with Leonard Tang and will brown. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with

Discussing "Mind the Gap" tonight at <a href="/haizelabs/">Haize Labs</a>'s NYC AI Reading Group with <a href="/leonardtang_/">Leonard Tang</a> and <a href="/willccbb/">will brown</a>. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with

thumb_up_off_alt61

chat_bubble_outline3

repeat9

shareShare