Sean Welleck (@wellecks) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We found that GRPO suffers from what we call a rank bias: reinforcing high probability correct outputs, but not low probability correct outputs (left plot) However, we argue that increasing low-probability correct outputs is important for improving pass@N (right plot)

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

Sean Welleck

@wellecks

a month ago

Unlikeliness reward dramatically changes how GRPO uplifts low probability vs. high probability sequences, leading to improved pass@N for high N. It also improves sample diversity, e.g. measured by unique proofs generated.

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

Aditi Raghunathan

@adtraghunathan

a month ago

I believe the next big test for LLMs is whether they can generate truly novel ideas in open-ended situations. We translate notions of "creativity" from cogsci into simple tasks that reveal how far today’s models fall, and how multi-token training + randomness might help.

thumb_up_off_alt81

chat_bubble_outline1

repeat9

shareShare

Akari Asai

@akariasai

a month ago

‘Bold,’ ‘positive’ and ‘unparalleled’: Allen School Ph.D. graduates Ashish Sharma and Sewon Min recognized with ACM Doctoral Dissertation Awards news.cs.washington.edu/2025/06/04/all… Massive congrats to Ashish Sharma and Sewon Min - huge win for UW NLP and the broader NLP community! 🙌

thumb_up_off_alt178

chat_bubble_outline5

repeat17

shareShare

Daniel Fried

@dan_fried

a month ago

I'm excited about Andre's work, which analyzes GRPO and identifies that it's biased towards reinforcing solutions that are already highly-probable. We found two easy-to-implement solutions. These improve pass@N, and produced a strong theorem proving model!

thumb_up_off_alt22

chat_bubble_outline0

repeat4

shareShare

Muhammad Khalifa

@mkhalifaaaa

a month ago

Simple yet cool idea. I find it interesting how the community is now cares more about pass@k than pass@1 eval which dominated the field over the last 5-6 months

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Lewis Tunstall

@_lewtun

a month ago

There's lots of RL goodies in the tech report behind FutureHouse's new reasoning model for chemistry 👀 Three things stood out to me: 1. Training domain-specific experts in parallel, before distilling into a generalist model. The clever thing here is that you can parallelise

There's lots of RL goodies in the tech report behind <a href="/FutureHouseSF/">FutureHouse</a>'s new reasoning model for chemistry 👀

Three things stood out to me:

1. Training domain-specific experts in parallel, before distilling into a generalist model. The clever thing here is that you can parallelise

thumb_up_off_alt308

chat_bubble_outline5

repeat39

shareShare

Azalia Mirhoseini

@azaliamirh

a month ago

In the test time scaling era, we all would love a higher throughput serving engine! Introducing Tokasaurus, a LLM inference engine for high-throughput workloads with large and small models! Led by Jordan Juravsky, in collaboration with hazyresearch and an amazing team!

thumb_up_off_alt139

chat_bubble_outline2

repeat20

shareShare

Songlin Yang

@songlinyang4

a month ago

Check out log-linear attention—our latest approach to overcoming the fundamental limitation of RNNs’ constant state size, while preserving subquadratic time and space complexity

thumb_up_off_alt568

chat_bubble_outline1

repeat50

shareShare

Infini-AI-Lab

@infiniailab

a month ago

🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)

thumb_up_off_alt239

chat_bubble_outline5

repeat65

shareShare

fly51fly

@fly51fly

a month ago

[LG] Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening A He, D Fried, S Welleck [CMU] (2025) arxiv.org/abs/2506.02355

thumb_up_off_alt37

chat_bubble_outline1

repeat10

shareShare

Sean Welleck

@wellecks

a month ago

Really nice work based on inference scaling laws that account for memory accesses. Very insightful!

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Muhammad Khalifa

@mkhalifaaaa

a month ago

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 Conference on Language Modeling is approaching!🚨 scalr-workshop.github.io 🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. Topics

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 <a href="/COLM_conf/">Conference on Language Modeling</a> is approaching!🚨

scalr-workshop.github.io

🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24.

Topics

thumb_up_off_alt16

chat_bubble_outline0

repeat8

shareShare