Chongyi Zheng (@chongyiz1) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Over the past year, I've been compiling some "alchemist's notes" on deep learning. Right now it covers basic optimization, architectures, and generative models. Focus is on learnability -- each page has nice graphics and an end-to-end implementation. notes.kvfrans.com

thumb_up_off_alt210

chat_bubble_outline3

repeat28

shareShare

Younggyo Seo

@younggyoseo

2 months ago

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

thumb_up_off_alt517

chat_bubble_outline14

repeat107

shareShare

Kevin Frans

@kvfrans

2 months ago

Stare at policy improvement and diffusion guidance, and you may notice a suspicious similarity... We lay out an equivalence between the two, formalizing a simple technique (CFGRL) to improve performance across-the-board when training diffusion policies. arxiv.org/abs/2505.23458

thumb_up_off_alt239

chat_bubble_outline8

repeat37

shareShare

Seohong Park

@seohong_park

2 months ago

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy π(a|s) 2. Train a conditional BC policy π(a|s, z) 3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). 🧵↓

thumb_up_off_alt339

chat_bubble_outline5

repeat41

shareShare

Seohong Park

@seohong_park

2 months ago

Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓

thumb_up_off_alt880

chat_bubble_outline9

repeat137

shareShare

John Zhou

@johnlyzhou

2 months ago

Hierarchical methods for offline goal-conditioned RL (GCRL) can scale to very distant goals that stymie flat (non-hierarchical) policies — but are they really necessary? Paper: arxiv.org/abs/2505.14975 Project page: johnlyzhou.github.io/saw/ Code: github.com/johnlyzhou/saw Thread ↓

thumb_up_off_alt62

chat_bubble_outline2

repeat13

shareShare

Kevin Frans

@kvfrans

2 months ago

Very excited for this one. We took a cautiously experimental view on NN optimizers, aiming to find something that just works. SPlus matches Adam within ~44% of steps on a range of objectives. Please try it out in your setting, or read below for how it works.

thumb_up_off_alt625

chat_bubble_outline37

repeat84

shareShare

Seohong Park

@seohong_park

2 months ago

New paper on unsupervised pre-training for RL! The idea is to learn a flow-based future prediction model for each "intention" in the dataset. We can then use these models to estimate values for fine-tuning.

thumb_up_off_alt150

chat_bubble_outline0

repeat20

shareShare

Sergey Levine

@svlevine

2 months ago

Unsupervised RL with intention-conditioned models provides a really interesting combination of predictive modeling and counterfactual learning (i.e., control). Getting such methods to work at scale has always been a challenge, but it's getting closer!

thumb_up_off_alt207

chat_bubble_outline5

repeat25

shareShare

Ben Eysenbach

@ben_eysenbach

2 months ago

What makes RL hard is the _time_ axis⏳, so let's pre-train RL policies to learn about _time_! Same intuition as successor representations 🧠, but made scalable with modern GenAI models 🚀. Excited to share new work led by Chongyi Zheng, together with Seohong Park and Sergey Levine!

thumb_up_off_alt78

chat_bubble_outline2

repeat7

shareShare

Seohong Park

@seohong_park

2 months ago

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

thumb_up_off_alt1,1K

chat_bubble_outline34

repeat174

shareShare

Qiyang Li

@qiyang_li

17 days ago

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N

thumb_up_off_alt334

chat_bubble_outline2

repeat60

shareShare