Ben Pan (@ybenpan) 's Twitter Profile
Ben Pan

@ybenpan

ID: 1763777022202089472

calendar_today02-03-2024 04:03:18

2 Tweet

20 Followers

163 Following

Ben Pan (@ybenpan) 's Twitter Profile Photo

Diagnosing RL runs was tricky. Around step 40, outputs started junking. By inspecting the traces, we found the model no longer began responses with “Okay,” — a sign of instability. This led us to a new metric: the “Not Okay Ratio” which helped predict junk in our runs.

Diagnosing RL runs was tricky. Around step 40, outputs started junking. By inspecting the traces, we found the model no longer began responses with “Okay,” — a sign of instability. This led us to a new metric: the “Not Okay Ratio” which helped predict junk in our runs.
vLLM (@vllm_project) 's Twitter Profile Photo

Great work! We love how vLLM is used in the rollout process with with offloading the engine to CPU and give the GPU back to the kernel to be benchmarked! This is a small feature we implemented to make RLHF smoother with vLLM.

Great work! We love how <a href="/vllm_project/">vLLM</a> is used in the rollout process with with offloading the engine to CPU and give the GPU back to the kernel to be benchmarked! This is a small feature we implemented to make RLHF smoother with vLLM.