Rafael Rafailov @ NeurIPS (@rm_rafailov) Twitter Tweets • TwiCopy

Rafael Rafailov @ NeurIPS

@rm_rafailov

+ Follow

Ph.D. Student at @StanfordAILab. I work on Foundation Models and Decision Making. Previously @GoogleDeepMind @UCBerkeley

ID: 1660344669916786688

linkhttps://rmrafailov.github.io/ calendar_today21-05-2023 18:11:57

1,1K Tweet

6,6K Followers

776 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

“We developed a fully asynchronous online RL training framework that enhanced flexibility. …. This innovation resulted in a ~10x improvement in training efficiency over previous generations.” Asynch distributed RL strikes again!

thumb_up_off_alt65

chat_bubble_outline1

repeat4

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

3 months ago

It strikes again.

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

Rishabh Agarwal

@agarwl_

3 months ago

Post-training is going to become training

thumb_up_off_alt261

chat_bubble_outline7

repeat17

shareShare

Suraj Nair

@surajnair_1

3 months ago

Since the first year of my PhD, every talk I’ve given has opened with a slide about the distant north star: dropping a robot in a home it’s never been before and having it do useful things. I think it might be time for me to find a new opening slide 😀. Thrilled to share π-0.5!

thumb_up_off_alt118

chat_bubble_outline5

repeat4

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

3 months ago

Meta-Search

thumb_up_off_alt21

chat_bubble_outline1

repeat3

shareShare

Aviral Kumar

@aviral_kumar2

3 months ago

At #ICLR25 workshops, my students+collabs will give many orals talks on newer stuff (don't miss!): - robot VLA RL fine-tuning Max Sobol Mark - optimizing test-time compute Yuxiao Qu - why RL is crucial for test-time scaling Amrith Setlur - scaling laws for value-based RL

thumb_up_off_alt63

chat_bubble_outline1

repeat5

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

3 months ago

And again…

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

2 months ago

GenRMs

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

John Yang

@jyangballin

2 months ago

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

thumb_up_off_alt638

chat_bubble_outline25

repeat132

shareShare

Jason Weston

@jaseweston

2 months ago

🚨 New paper 🚨 J1: Incentivizing Thinking in LLM-as-a-Judge via RL - Converts judgement task into a verifiable one for both verifiable and non-verifiable prompts. Uses only synthetic pairwise data - Optimizes thoughts, scores, and judgments using GRPO - Outperforms all

thumb_up_off_alt377

chat_bubble_outline0

repeat63

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

a month ago

(Meta) CoTs are search inside world models (the prompt is the goal specification).

thumb_up_off_alt42

chat_bubble_outline0

repeat3

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

a month ago

When we first published our work on this 9 months ago it was rejected for being impractical in realistic cases. Six months later it was rejected for lack of novelty. It’s the way academic publishing goes.

thumb_up_off_alt154

chat_bubble_outline4

repeat14

shareShare

James Alcorn

@jamesalcorn94

a month ago

congrats Rafael Rafailov @ NeurIPS on your hard-earned acceptance to the USofA as alien of officially extraordinary ability. The alien piece comes as no surprise to your mates of course, but at least the general public now has fair warning and a fighting chance. To celebrate with a fitting

congrats <a href="/rm_rafailov/">Rafael Rafailov @ NeurIPS</a> on your hard-earned acceptance to the USofA as alien of officially extraordinary ability. The alien piece comes as no surprise to your mates of course, but at least the general public now has fair warning and a fighting chance. To celebrate with a fitting

thumb_up_off_alt39

chat_bubble_outline3

repeat2

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

a month ago

I make the AI, very nice!

thumb_up_off_alt57

chat_bubble_outline6

repeat0

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

a month ago

It’s been very surprising how few people understand this.

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Violet X.

@ziyux

a month ago

Check out this work on benchmarking how well LLMs can implement ML research papers into code led by Tianyu Hua !

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Rafael Rafailov @ NeurIPS

@rm_rafailov

25 days ago

No way man, one sample is all you need to collapse!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

SynthLabs

@synth_labs

22 days ago

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10