Seohong Park (@seohong_park) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

I really liked this work because of the solid science. There are 17 pages of experiments in the appendix… We systematically tried to scale every axis we could think of (data, model size, compute) and over 1000+ trials found only one thing consistently mattered.

thumb_up_off_alt38

chat_bubble_outline0

repeat3

shareShare

Aviral Kumar

@aviral_kumar2

2 months ago

Can offline RL methods do well on any problem, as we scale compute and data? In our new paper led by Seohong Park, we show that task horizon can fundamentally hinder scaling for offline RL, and how explicitly reducing task horizon can address this. arxiv.org/abs/2506.04168 🧵⬇️

Can offline RL methods do well on any problem, as we scale compute and data?

In our new paper led by <a href="/seohong_park/">Seohong Park</a>, we show that task horizon can fundamentally hinder scaling for offline RL, and how explicitly reducing task horizon can address this.
arxiv.org/abs/2506.04168

🧵⬇️

thumb_up_off_alt130

chat_bubble_outline2

repeat19

shareShare

Oleg Rybkin

@_oleh

2 months ago

Really interesting result! Scaling value-based RL is hard and we are still missing much of the machinery to do it. Seohong Park shows that horizon is the critical issue.

thumb_up_off_alt17

chat_bubble_outline0

repeat2

shareShare

Ben Eysenbach

@ben_eysenbach

2 months ago

Do huge amounts of data give (offline) RL algorithms the capacity to perform long-horizon reasoning? A: No. Today's algorithms are bottlenecked by the task horizon, not dataset size. Seohong Park 's new paper gives an algorithm that addresses horizon to boost performance.

thumb_up_off_alt64

chat_bubble_outline0

repeat4

shareShare

Sergey Levine

@svlevine

2 months ago

In this work, we define a notion of scalability for offline RL based on task complexity, and show that (perhaps surprisingly), it's *really* hard to make 1-step TD methods (i.e., TD(0) style) to "scale" even with huge datasets. But hierarchical methods *can* scale.

thumb_up_off_alt227

chat_bubble_outline1

repeat34

shareShare

John Zhou

@johnlyzhou

2 months ago

Hierarchical methods for offline goal-conditioned RL (GCRL) can scale to very distant goals that stymie flat (non-hierarchical) policies — but are they really necessary? Paper: arxiv.org/abs/2505.14975 Project page: johnlyzhou.github.io/saw/ Code: github.com/johnlyzhou/saw Thread ↓

thumb_up_off_alt62

chat_bubble_outline2

repeat13

shareShare

Kevin Frans

@kvfrans

2 months ago

Very excited for this one. We took a cautiously experimental view on NN optimizers, aiming to find something that just works. SPlus matches Adam within ~44% of steps on a range of objectives. Please try it out in your setting, or read below for how it works.

thumb_up_off_alt625

chat_bubble_outline37

repeat84

shareShare

Chongyi Zheng

@chongyiz1

a month ago

1/ How should RL agents prepare to solve new tasks? While prior methods often learn a model that predicts the immediate next observation, we build a model that predicts many steps into the future, conditioning on different user intentions: chongyi-zheng.github.io/infom.

thumb_up_off_alt92

chat_bubble_outline1

repeat15

shareShare

Seohong Park

@seohong_park

a month ago

New paper on unsupervised pre-training for RL! The idea is to learn a flow-based future prediction model for each "intention" in the dataset. We can then use these models to estimate values for fine-tuning.

thumb_up_off_alt150

chat_bubble_outline0

repeat20

shareShare

Oleg Rybkin

@_oleh

a month ago

Very insightful analysis that I mostly agree with (except the overly pessimistic title :)!

thumb_up_off_alt23

chat_bubble_outline3

repeat4

shareShare

Tongzhou Wang

@ssnl_tz

a month ago

such a nice & clear articulation of the big question by Seohong Park ! also thanks for mentioning Quasimetric RL. now I just need to show people this post instead of explaining why I am excited by QRL :)

thumb_up_off_alt88

chat_bubble_outline0

repeat19

shareShare