Chanwoo Park (@chanwoopark20) 's Twitter Profile
Chanwoo Park

@chanwoopark20

Games, Multi-agent (gen) AI | @speedrun SR003 | @mit EECS Ph.D. Candidate

ID: 1457347791723069440

linkhttps://chanwoo-park-official.github.io/ calendar_today07-11-2021 14:04:11

702 Tweet

1,1K Followers

1,1K Following

Chanwoo Park (@chanwoopark20) 's Twitter Profile Photo

The LMSYS Chat dataset hasn’t been updated in over a year, yet recent models consistently top the leaderboard early on. Are rankings being manipulated? Analyzing model selection trends over time could reveal the right metric for fair LLM evaluation... Wanna check the recent

The LMSYS Chat dataset hasn’t been updated in over a year, yet recent models consistently top the leaderboard early on. Are rankings being manipulated? Analyzing model selection trends over time could reveal the right metric for fair LLM evaluation... Wanna check the recent
Chanwoo Park (@chanwoopark20) 's Twitter Profile Photo

I even believe that RL (or Regret training) is needed before SFT - or even pertaining phase -- if you worked on RL with LLM, you will know what I am saying. Super nice to read this paper!

Chanwoo Park (@chanwoopark20) 's Twitter Profile Photo

One of the biggest cultural differences between East Asians and Americans is the concept of "face" (面子, 체면)—a nuanced idea that encompasses social reputation, honor, and maintaining harmony in interpersonal relationships, with no direct English equivalent. This distinction

Vishal Pandey (@its_vayishu) 's Twitter Profile Photo

I interviewed for an ML research internship at Meta (FAIR) a few years back. Don’t remember every detail now, but a few questions stuck with me. Questions are below.

Chanwoo Park (@chanwoopark20) 's Twitter Profile Photo

My mother uses ChatGPT and trades ETH—no surprise there! I really hope South Korea can emerge as a global leader across key tech domains like robotics, AI, crypto, etc. On that note, Korea also offers strong government-backed investment support for startups, which is a great

Zae Myung Kim (@zaemyung) 's Twitter Profile Photo

🚨 New Paper Alert! 🚨 How can we align language models without drowning in prompt engineering or falling into reward hacking traps? We introduce Meta Policy Optimization (MPO)—a new reinforcement learning framework that evolves its own reward model rubrics through meta-level

🚨 New Paper Alert! 🚨

How can we align language models without drowning in prompt engineering or falling into reward hacking traps?

We introduce Meta Policy Optimization (MPO)—a new reinforcement learning framework that evolves its own reward model rubrics through meta-level
Chanwoo Park (@chanwoopark20) 's Twitter Profile Photo

That is the reason you need an evolving reward function. huggingface.co/papers/2504.20… Check out this paper. -- providing some answers about "curriculum learning / evolving reward / reward hacking" using evaluative thinking. Zae Myung Kim

Stella Li (@stellalisy) 's Twitter Profile Photo

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

🤯 We cracked RLVR with... Random Rewards?!
Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:
- Random rewards: +21%
- Incorrect rewards: +25%
- (FYI) Ground-truth rewards: + 28.8%
How could this even work⁉️ Here's why: 🧵
Blogpost: tinyurl.com/spurious-rewar…