Minhak Song (@minhaksong) 's Twitter Profile
Minhak Song

@minhaksong

Undergrad at KAIST; Currently Intern at @uwcse; Interested in DL/RL/LLM Theory, Optimization

ID: 1603765192134774784

linkhttps://songminhak.github.io/ calendar_today16-12-2022 14:53:32

7 Tweet

51 Followers

99 Following

Simon Shaolei Du (@simonshaoleidu) 's Twitter Profile Photo

PPO vs. DPO? 🤔 Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: arxiv.org/abs/2505.19770 Led by Ruizhe Shi Minhak Song

Minhak Song (@minhaksong) 's Twitter Profile Photo

RLHF vs DPO under reward and/or policy model misspecification—when does each method succeed? Our new paper provides a fine-grained theoretical comparison. 📄 arxiv.org/abs/2505.19770

Konstantin Mishchenko (@konstmish) 's Twitter Profile Photo

Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape.

Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape.
Jeremy Cohen (@deepcohen) 's Twitter Profile Photo

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With Alex Damian, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.