Ben Pan (@ybenpan) Twitter Tweets • TwiCopy

Ben Pan

@ybenpan

+ Follow

ID: 1763777022202089472

calendar_today02-03-2024 04:03:18

2 Tweet

20 Followers

163 Following

will brown

@willccbb

6 months ago

wow big day for multi-turn GRPO incredible writeup too

thumb_up_off_alt407

chat_bubble_outline10

repeat40

shareShare

Diagnosing RL runs was tricky. Around step 40, outputs started junking. By inspecting the traces, we found the model no longer began responses with “Okay,” — a sign of instability. This led us to a new metric: the “Not Okay Ratio” which helped predict junk in our runs.

thumb_up_off_alt40

chat_bubble_outline2

repeat4

shareShare

Binyuan Hui

@huybery

6 months ago

Congrats, great work! Also happy to see it's trained on QwQ!

thumb_up_off_alt201

chat_bubble_outline3

repeat11

shareShare

vLLM

@vllm_project

6 months ago

Great work! We love how vLLM is used in the rollout process with with offloading the engine to CPU and give the GPU back to the kernel to be benchmarked! This is a small feature we implemented to make RLHF smoother with vLLM.