Yuxiao Qu (@quyuxiao) 's Twitter Profile
Yuxiao Qu

@quyuxiao

PhD @mldcmu, advised by @aviral_kumar2 and @rsalakhu
Interests: Reasoning & RL & FMs
Prev: @UWMadison, @UW, @CUHKofficial

ID: 1327289168708431872

linkhttps://cohenqu.github.io/ calendar_today13-11-2020 16:36:20

20 Tweet

266 Followers

95 Following

Yuxiao Qu (@quyuxiao) 's Twitter Profile Photo

I’ll be at #NeurIPS2024 next week to present our work on 📎Recursive Introspection: Teaching Language Model Agents How to Self-Improve 📌Poster Session 3 East #2805 🗓️Dec 12, 11:00-2:00 This is joint work with amazing collaborators Tianjun Zhang, Naman, Aviral Kumar

I’ll be at #NeurIPS2024 next week to present our work on 

📎Recursive Introspection: Teaching Language Model Agents How to Self-Improve
 
📌Poster Session 3 East #2805
🗓️Dec 12, 11:00-2:00

This is joint work with amazing collaborators <a href="/tianjun_zhang/">Tianjun Zhang</a>, Naman, <a href="/aviral_kumar2/">Aviral Kumar</a>
Aviral Kumar (@aviral_kumar2) 's Twitter Profile Photo

At #NeurIPS2024 main conf, we will present several works on understanding offline RL methods, RL for LLM reasoning, agents, etc. led by my students and collaborators. Come talk to us to learn more and discuss future directions + what we are excited about! More details in 🧵⬇️

At #NeurIPS2024 main conf, we will present several  works on understanding offline RL methods, RL for LLM  reasoning, agents, etc. led by my students and collaborators. Come talk to us to learn more and discuss future directions + what we are excited about!

More details in 🧵⬇️
ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/01/08/opt… How can we train LLMs to solve complex challenges beyond just data scaling? In a new blogpost, Amrith Setlur, Yuxiao Qu Matthew Yang, Lunjun Zhang , Virginia Smith  and Aviral Kumar demonstrate that Meta RL can help LLMs better optimize test time compute

So Yeon (Tiffany) Min on Industry Job Market (@soyeontiffmin) 's Twitter Profile Photo

🚨🚨 Preprint Alert 🚨🚨 🚀🚀 As AI become agents 🤖, how can we reliably delegate tasks to them, if they cannot communicate their limitations😭 or ask for help or test-time compute 🧑‍🚒 when needed? We present our new pre-print **Self-Regulation and Requesting Interventions**

Isaac Liao (@liaoisaac91893) 's Twitter Profile Photo

Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4

Amrith Setlur (@setlur_amrith) 's Twitter Profile Photo

Scaling test-time compute is fine 😒 but are we making good use of it? 🤔 We try to answer this question in our new work: arxiv.org/pdf/2503.07572 TLDR; 🚀 *Optimizing* test-time compute = RL with dense (progress) rewards = minimizing regret over long CoT episodes 😲 🧵⤵️

Scaling test-time compute is fine 😒 but are we making good use of it? 🤔
We try to answer this question in our new work: arxiv.org/pdf/2503.07572
TLDR;
🚀 *Optimizing* test-time compute  = RL with dense (progress) rewards = minimizing regret over long CoT episodes  😲
🧵⤵️
Max Simchowitz (@max_simchowitz) 's Twitter Profile Photo

There’s a lot of awesome research about LLM reasoning right now. But how is  learning in the physical world 🤖different than in language 📚? In a new paper, show that imitation learning in continuous spaces can be exponentially harder than for discrete state spaces, even when

Yuxiao Qu (@quyuxiao) 's Twitter Profile Photo

I am excited to give an oral talk on our work about “Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning” at #ICLR2025 FM-Wild Workshop! 🚀 📍Hall 4 #6 🕚11:30AM, April 27th 🖥️Can’t be there in person, but chat with Ian Wu who’ll present our poster after the talk!

I am excited to give an oral talk on our work about “Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning” at #ICLR2025 FM-Wild Workshop! 🚀
📍Hall 4 #6
🕚11:30AM, April 27th
🖥️Can’t be there in person, but chat with <a href="/ianwu97/">Ian Wu</a> who’ll present our poster after the talk!
Yutong (Kelly) He (@electronickale) 's Twitter Profile Photo

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

Yuxiao Qu (@quyuxiao) 's Twitter Profile Photo

Heading to ICML Conference #ICML2025 this week! DM me if you’d like to chat ☕️ Come by our poster sessions on: 🧠 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (arxiv.org/abs/2503.07572) 🔍 Learning to Discover Abstractions for LLM Reasoning (drive.google.com/file/d/1Sfafrk…)

Heading to <a href="/icmlconf/">ICML Conference</a> #ICML2025 this week! DM me if you’d like to chat ☕️

Come by our poster sessions on:
🧠 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (arxiv.org/abs/2503.07572)
🔍 Learning to Discover Abstractions for LLM Reasoning (drive.google.com/file/d/1Sfafrk…)
Aviral Kumar (@aviral_kumar2) 's Twitter Profile Photo

If you are at #icml25 and are interested in RL algorithms, scaling laws for RL, and test-time scaling (& related stuff), come talk to us at various poster sessions (details ⬇️). We are also presenting some things at workshops later in the week, more on that later.

If you are at #icml25 and are interested in RL algorithms, scaling laws for RL, and test-time scaling (&amp; related stuff),  come talk to us at various poster sessions (details ⬇️).

We are also presenting some things at workshops later in the week, more on that later.