Minhak Song (@minhaksong) Twitter Tweets • TwiCopy

Minhak Song

@minhaksong

+ Follow

Undergrad at KAIST; Currently Intern at @uwcse; Interested in DL/RL/LLM Theory, Optimization

ID: 1603765192134774784

linkhttps://songminhak.github.io/ calendar_today16-12-2022 14:53:32

7 Tweet

51 Followers

99 Following

Simon Shaolei Du

@simonshaoleidu

5 months ago

PPO vs. DPO? 🤔 Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: arxiv.org/abs/2505.19770 Led by Ruizhe Shi Minhak Song

thumb_up_off_alt97

chat_bubble_outline0

repeat18

shareShare

Minhak Song

@minhaksong

5 months ago

RLHF vs DPO under reward and/or policy model misspecification—when does each method succeed? Our new paper provides a fine-grained theoretical comparison. 📄 arxiv.org/abs/2505.19770

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Gokul Swamy

@g_k_swamy

5 months ago

Very clear paper fleshing out different extensions of the story we outlined in arxiv.org/abs/2503.01067!

thumb_up_off_alt15

chat_bubble_outline0

repeat3

shareShare

Chanwoo Park

@chanwoopark20

5 months ago

Interesting perspective. Misspecification does matter.

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape.

thumb_up_off_alt141

chat_bubble_outline4

repeat19

shareShare

Jason Lee

@jasondeanlee

4 months ago

Really nice use of the central flow framework!

thumb_up_off_alt21

chat_bubble_outline0

repeat2

shareShare

Jeremy Cohen

@deepcohen

a month ago

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With Alex Damian, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat169

shareShare

Minhak Song

Simon Shaolei Du

Minhak Song

Gokul Swamy

Chanwoo Park

Konstantin Mishchenko

Jason Lee

Jeremy Cohen