Yuxiao Qu (@quyuxiao) Twitter Tweets • TwiCopy

Yuxiao Qu

@quyuxiao

+ Follow

PhD @mldcmu, advised by @aviral_kumar2 and @rsalakhu
Interests: Reasoning & RL & FMs
Prev: @UWMadison, @UW, @CUHKofficial

ID: 1327289168708431872

linkhttps://cohenqu.github.io/ calendar_today13-11-2020 16:36:20

20 Tweet

266 Followers

95 Following

Yuxiao Qu

@quyuxiao

9 months ago

I’ll be at #NeurIPS2024 next week to present our work on 📎Recursive Introspection: Teaching Language Model Agents How to Self-Improve 📌Poster Session 3 East #2805 🗓️Dec 12, 11:00-2:00 This is joint work with amazing collaborators Tianjun Zhang, Naman, Aviral Kumar

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

Aviral Kumar

@aviral_kumar2

9 months ago

At #NeurIPS2024 main conf, we will present several works on understanding offline RL methods, RL for LLM reasoning, agents, etc. led by my students and collaborators. Come talk to us to learn more and discuss future directions + what we are excited about! More details in 🧵⬇️

thumb_up_off_alt95

chat_bubble_outline1

repeat16

shareShare

ML@CMU

@mlcmublog

8 months ago

blog.ml.cmu.edu/2025/01/08/opt… How can we train LLMs to solve complex challenges beyond just data scaling? In a new blogpost, Amrith Setlur, Yuxiao Qu Matthew Yang, Lunjun Zhang , Virginia Smith and Aviral Kumar demonstrate that Meta RL can help LLMs better optimize test time compute

thumb_up_off_alt91

chat_bubble_outline3

repeat22

shareShare

So Yeon (Tiffany) Min on Industry Job Market

@soyeontiffmin

7 months ago

🚨🚨 Preprint Alert 🚨🚨 🚀🚀 As AI become agents 🤖, how can we reliably delegate tasks to them, if they cannot communicate their limitations😭 or ask for help or test-time compute 🧑‍🚒 when needed? We present our new pre-print **Self-Regulation and Requesting Interventions**

thumb_up_off_alt108

chat_bubble_outline1

repeat39

shareShare

Isaac Liao

@liaoisaac91893

6 months ago

Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4

thumb_up_off_alt1,1K

chat_bubble_outline36

repeat209

shareShare

Amrith Setlur

@setlur_amrith

6 months ago

Scaling test-time compute is fine 😒 but are we making good use of it? 🤔 We try to answer this question in our new work: arxiv.org/pdf/2503.07572 TLDR; 🚀 *Optimizing* test-time compute = RL with dense (progress) rewards = minimizing regret over long CoT episodes 😲 🧵⤵️

thumb_up_off_alt14

chat_bubble_outline2

repeat5

shareShare

Max Simchowitz

@max_simchowitz

5 months ago

There’s a lot of awesome research about LLM reasoning right now. But how is learning in the physical world 🤖different than in language 📚? In a new paper, show that imitation learning in continuous spaces can be exponentially harder than for discrete state spaces, even when

thumb_up_off_alt214

chat_bubble_outline3

repeat37

shareShare

Yuxiao Qu

@quyuxiao

4 months ago

I am excited to give an oral talk on our work about “Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning” at #ICLR2025 FM-Wild Workshop! 🚀 📍Hall 4 #6 🕚11:30AM, April 27th 🖥️Can’t be there in person, but chat with Ian Wu who’ll present our poster after the talk!

thumb_up_off_alt26

chat_bubble_outline0

repeat2

shareShare

Yutong (Kelly) He

@electronickale

4 months ago

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

thumb_up_off_alt83

chat_bubble_outline2

repeat31

shareShare

Yuxiao Qu

@quyuxiao

2 months ago

Heading to ICML Conference #ICML2025 this week! DM me if you’d like to chat ☕️ Come by our poster sessions on: 🧠 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (arxiv.org/abs/2503.07572) 🔍 Learning to Discover Abstractions for LLM Reasoning (drive.google.com/file/d/1Sfafrk…)

Heading to <a href="/icmlconf/">ICML Conference</a> #ICML2025 this week! DM me if you’d like to chat ☕️

Come by our poster sessions on:
🧠 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (arxiv.org/abs/2503.07572)
🔍 Learning to Discover Abstractions for LLM Reasoning (drive.google.com/file/d/1Sfafrk…)

thumb_up_off_alt45

chat_bubble_outline0

repeat7

shareShare

Aviral Kumar

@aviral_kumar2

2 months ago

If you are at #icml25 and are interested in RL algorithms, scaling laws for RL, and test-time scaling (& related stuff), come talk to us at various poster sessions (details ⬇️). We are also presenting some things at workshops later in the week, more on that later.

thumb_up_off_alt151

chat_bubble_outline1

repeat10

shareShare