Oleg Rybkin (@_oleh) 's Twitter Profile
Oleg Rybkin

@_oleh

๐Ÿ‡บ๐Ÿ‡ฆ Postdoc @ Berkeley. Interested in RL at scale.

ID: 2306706864

linkhttp://olehrybkin.com calendar_today23-01-2014 14:37:12

282 Tweet

835 Followers

402 Following

fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] Value-Based Deep RL Scales Predictably O Rybkin, M Nauman, P Fu, C Snell... [UC Berkeley] (2025) arxiv.org/abs/2502.04327

[LG] Value-Based Deep RL Scales Predictably
O Rybkin, M Nauman, P Fu, C Snell... [UC Berkeley] (2025)
arxiv.org/abs/2502.04327
Paul Zhou (@zhiyuan_zhou_) 's Twitter Profile Photo

Can we make robot policy evaluation easier and less time consuming? Introducing AutoEval, a system that *autonomously* evaluates generalist policies 24/7 and closely matches human results. We make 4 tasks ๐Ÿ’ซpublicly available๐Ÿ’ซ Submit your policy at auto-eval.github.io! ๐Ÿงต๐Ÿ‘‡

Danijar Hafner (@danijarh) 's Twitter Profile Photo

Excited to share that DreamerV3 has been published in Nature! Dreamer solves control tasks by imagining the future outcomes of its actions inside of a continuously learned world model ๐ŸŒ It's the first agent to find diamonds in Minecraft from scratch without human data! ๐Ÿ’Ž ๐Ÿ‘‡

Excited to share that DreamerV3 has been published in Nature!

Dreamer solves control tasks by imagining the future outcomes of its actions inside of a continuously learned world model ๐ŸŒ

It's the first agent to find diamonds in Minecraft from scratch without human data! ๐Ÿ’Ž

๐Ÿ‘‡
Chuning Zhu (@chuning_zhu) 's Twitter Profile Photo

Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)

Oleg Rybkin (@_oleh) 's Twitter Profile Photo

Check out a new paper by Amber Xie! We show that you can do robotic imitation learning well by planning future latent states instead of actions with a diffusion model. This planning method is also more flexible, allowing you to use suboptimal and action-free data.

Aviral Kumar (@aviral_kumar2) 's Twitter Profile Photo

Oleg Rybkin will also present an oral talk on our recent work on building scaling laws for value-based RL. We find that value-based deep RL algorithms scale predictably. Talk at Workshop on robot learning (WRL), April 27. Charlie Snell will then present the poster!

<a href="/_oleh/">Oleg Rybkin</a> will also present an oral talk on our recent work on building scaling laws for value-based RL. We find that value-based deep RL algorithms scale predictably.

Talk at Workshop on robot learning (WRL), April 27.  <a href="/sea_snell/">Charlie Snell</a> will then present the poster!
Arthur Allshire (@arthurallshire) 's Twitter Profile Photo

our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ Hongsuk Benjamin Choi Junyi Zhang David McAllister)

Paul Zhou (@zhiyuan_zhou_) 's Twitter Profile Photo

This was fun thanks for having me Chris Paxton Michael Cho - Rbt/Acc! See the podcast for some livestream of the robot in real time and me evaluating a policy live! Or check it out for yourself at auto-eval.github.io and eval your policy in real without breaking a sweat

Seohong Park (@seohong_park) 's Twitter Profile Photo

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy ฯ€(a|s) 2. Train a conditional BC policy ฯ€(a|s, z) 3. Amplify(!) the difference between ฯ€(a|s, z) and ฯ€(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). ๐Ÿงตโ†“

We found a way to do RL *only* with BC policies.

The idea is simple:

1. Train a BC policy ฯ€(a|s)
2. Train a conditional BC policy ฯ€(a|s, z)
3. Amplify(!) the difference between ฯ€(a|s, z) and ฯ€(a|s) using CFG

Here, z can be anything (e.g., goals for goal-conditioned RL).

๐Ÿงตโ†“