Sean Welleck (@wellecks) 's Twitter Profile
Sean Welleck

@wellecks

Assistant Professor at CMU. Marathoner, @thesisreview.

ID: 280403336

linkhttp://wellecks.com calendar_today11-04-2011 07:59:23

1,1K Tweet

6,6K Followers

225 Following

Sean Welleck (@wellecks) 's Twitter Profile Photo

We found that GRPO suffers from what we call a rank bias: reinforcing high probability correct outputs, but not low probability correct outputs (left plot) However, we argue that increasing low-probability correct outputs is important for improving pass@N (right plot)

We found that GRPO suffers from what we call a rank bias: reinforcing high probability correct outputs, but not low probability correct outputs (left plot)

However, we argue that increasing low-probability correct outputs is important for improving pass@N (right plot)
Sean Welleck (@wellecks) 's Twitter Profile Photo

Unlikeliness reward dramatically changes how GRPO uplifts low probability vs. high probability sequences, leading to improved pass@N for high N. It also improves sample diversity, e.g. measured by unique proofs generated.

Unlikeliness reward dramatically changes how GRPO uplifts low probability vs. high probability sequences, leading to improved pass@N for high N. 

It also improves sample diversity, e.g. measured by unique proofs generated.
Aditi Raghunathan (@adtraghunathan) 's Twitter Profile Photo

I believe the next big test for LLMs is whether they can generate truly novel ideas in open-ended situations. We translate notions of "creativity" from cogsci into simple tasks that reveal how far today’s models fall, and how multi-token training + randomness might help.

Akari Asai (@akariasai) 's Twitter Profile Photo

‘Bold,’ ‘positive’ and ‘unparalleled’: Allen School Ph.D. graduates Ashish Sharma and Sewon Min recognized with ACM Doctoral Dissertation Awards news.cs.washington.edu/2025/06/04/all… Massive congrats to Ashish Sharma and Sewon Min - huge win for UW NLP and the broader NLP community! 🙌

Daniel Fried (@dan_fried) 's Twitter Profile Photo

I'm excited about Andre's work, which analyzes GRPO and identifies that it's biased towards reinforcing solutions that are already highly-probable. We found two easy-to-implement solutions. These improve pass@N, and produced a strong theorem proving model!

Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

Simple yet cool idea. I find it interesting how the community is now cares more about pass@k than pass@1 eval which dominated the field over the last 5-6 months

Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

There's lots of RL goodies in the tech report behind FutureHouse's new reasoning model for chemistry 👀 Three things stood out to me: 1. Training domain-specific experts in parallel, before distilling into a generalist model. The clever thing here is that you can parallelise

There's lots of RL goodies in the tech report behind <a href="/FutureHouseSF/">FutureHouse</a>'s new reasoning model for chemistry 👀

Three things stood out to me:

1. Training domain-specific experts in parallel, before distilling into a generalist model. The clever thing here is that you can parallelise
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

In the test time scaling era, we all would love a higher throughput serving engine! Introducing Tokasaurus, a LLM inference engine for high-throughput workloads with large and small models! Led by Jordan Juravsky, in collaboration with hazyresearch and an amazing team!

In the test time scaling era, we all would love a higher throughput serving engine! Introducing Tokasaurus, a LLM inference engine for high-throughput workloads with large and small models!

Led by <a href="/jordanjuravsky/">Jordan Juravsky</a>, in collaboration with <a href="/HazyResearch/">hazyresearch</a> and an amazing team!
Songlin Yang (@songlinyang4) 's Twitter Profile Photo

Check out log-linear attention—our latest approach to overcoming the fundamental limitation of RNNs’ constant state size, while preserving subquadratic time and space complexity

Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

🥳 Happy to share our new work –  Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)

🥳 Happy to share our new work –  Kinetics: Rethinking Test-Time Scaling Laws

🤔How to effectively build a powerful reasoning agent?

Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model &gt; 32B model.
But, It only shows half of the picture!

🚨 The O(N²)
fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening A He, D Fried, S Welleck [CMU] (2025) arxiv.org/abs/2506.02355

[LG] Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
A He, D Fried, S Welleck [CMU] (2025)
arxiv.org/abs/2506.02355
Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 Conference on Language Modeling is approaching!🚨 scalr-workshop.github.io 🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. Topics

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling &amp; Reasoning Models at COLM '25 <a href="/COLM_conf/">Conference on Language Modeling</a>  is approaching!🚨

scalr-workshop.github.io

🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. 

Topics