Kevin Frans (@kvfrans) Twitter Tweets • TwiCopy

Kevin Frans

@kvfrans

+ Follow

@berkeley_ai @reflection_ai prev mit, read my thoughts: kvfrans.com

ID: 1665897810

linkhttp://kvfrans.com calendar_today12-08-2013 20:09:16

440 Tweet

2,2K Followers

479 Following

Seohong Park

@seohong_park

6 months ago

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy π(a|s) 2. Train a conditional BC policy π(a|s, z) 3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). 🧵↓

thumb_up_off_alt339

chat_bubble_outline5

repeat41

shareShare

Kevin Frans

@kvfrans

6 months ago

Cool ideas from Yifei, who generally has great sense on building these self-improving systems

thumb_up_off_alt23

chat_bubble_outline1

repeat3

shareShare

Kevin Frans

@kvfrans

6 months ago

I really liked this work because of the solid science. There are 17 pages of experiments in the appendix… We systematically tried to scale every axis we could think of (data, model size, compute) and over 1000+ trials found only one thing consistently mattered.

thumb_up_off_alt38

chat_bubble_outline0

repeat3

shareShare

N8 Programs

@n8programs

6 months ago

Replicated in MLX on MNIST. S+ is an intriguing optimizer that excels at both memorizing the training data and generalizing well. Very intriguing and different from most other optimizers I've tested. Takes some time to get going but typically ends up doing slightly better than

thumb_up_off_alt63

chat_bubble_outline4

repeat6

shareShare

Kevin Zakka

@kevin_zakka

5 months ago

We’re super thrilled to have received the Outstanding Demo Paper Award for MuJoCo Playground at RSS 2025! Huge thanks to everyone who came by our booth and participated, asked questions, and made the demo so much fun! Carlo Sferrazza Qiayuan Liao Arthur Allshire

thumb_up_off_alt323

chat_bubble_outline27

repeat21

shareShare