Seohong Park (@seohong_park) 's Twitter Profile
Seohong Park

@seohong_park

Reinforcement learning | CS Ph.D. student @berkeley_ai

ID: 1486168076697894916

linkhttps://seohong.me/ calendar_today26-01-2022 02:44:21

349 Tweet

1,1K Followers

462 Following

Kevin Frans (@kvfrans) 's Twitter Profile Photo

I really liked this work because of the solid science. There are 17 pages of experiments in the appendix… We systematically tried to scale every axis we could think of (data, model size, compute) and over 1000+ trials found only one thing consistently mattered.

Aviral Kumar (@aviral_kumar2) 's Twitter Profile Photo

Can offline RL methods do well on any problem, as we scale compute and data? In our new paper led by Seohong Park, we show that task horizon can fundamentally hinder scaling for offline RL, and how explicitly reducing task horizon can address this. arxiv.org/abs/2506.04168 🧵⬇️

Can offline RL methods do well on any problem, as we scale compute and data?

In our new paper led by <a href="/seohong_park/">Seohong Park</a>, we show that task horizon can fundamentally hinder scaling for offline RL, and how explicitly reducing task horizon can address this.
arxiv.org/abs/2506.04168

🧵⬇️
Oleg Rybkin (@_oleh) 's Twitter Profile Photo

Really interesting result! Scaling value-based RL is hard and we are still missing much of the machinery to do it. Seohong Park shows that horizon is the critical issue.

Ben Eysenbach (@ben_eysenbach) 's Twitter Profile Photo

Do huge amounts of data give (offline) RL algorithms the capacity to perform long-horizon reasoning? A: No. Today's algorithms are bottlenecked by the task horizon, not dataset size. Seohong Park 's new paper gives an algorithm that addresses horizon to boost performance.

Sergey Levine (@svlevine) 's Twitter Profile Photo

In this work, we define a notion of scalability for offline RL based on task complexity, and show that (perhaps surprisingly), it's *really* hard to make 1-step TD methods (i.e., TD(0) style) to "scale" even with huge datasets. But hierarchical methods *can* scale.

John Zhou (@johnlyzhou) 's Twitter Profile Photo

Hierarchical methods for offline goal-conditioned RL (GCRL) can scale to very distant goals that stymie flat (non-hierarchical) policies — but are they really necessary? Paper: arxiv.org/abs/2505.14975 Project page: johnlyzhou.github.io/saw/ Code: github.com/johnlyzhou/saw Thread ↓

Kevin Frans (@kvfrans) 's Twitter Profile Photo

Very excited for this one. We took a cautiously experimental view on NN optimizers, aiming to find something that just works. SPlus matches Adam within ~44% of steps on a range of objectives. Please try it out in your setting, or read below for how it works.

Very excited for this one. We took a cautiously experimental view on NN optimizers, aiming to find something that just works. 

SPlus matches Adam within ~44% of steps on a range of objectives. Please try it out in your setting, or read below for how it works.
Chongyi Zheng (@chongyiz1) 's Twitter Profile Photo

1/ How should RL agents prepare to solve new tasks? While prior methods often learn a model that predicts the immediate next observation, we build a model that predicts many steps into the future, conditioning on different user intentions: chongyi-zheng.github.io/infom.

Seohong Park (@seohong_park) 's Twitter Profile Photo

New paper on unsupervised pre-training for RL! The idea is to learn a flow-based future prediction model for each "intention" in the dataset. We can then use these models to estimate values for fine-tuning.

Tongzhou Wang (@ssnl_tz) 's Twitter Profile Photo

such a nice & clear articulation of the big question by Seohong Park ! also thanks for mentioning Quasimetric RL. now I just need to show people this post instead of explaining why I am excited by QRL :)