Khanh Nguyen (on job market) (@khanhxuannguyen) 's Twitter Profile
Khanh Nguyen (on job market)

@khanhxuannguyen

Postdoc at CHAI Berkeley with Prof. Stuart Russell, Prev. Postdoc at Princeton NLP, PhD @umdcs, Human-AI Communication, Interactive Learning, NLP.

ID: 2829879914

linkhttp://machineslearner.com calendar_today24-09-2014 13:18:31

1,1K Tweet

1,1K Followers

468 Following

Dylan HadfieldMenell (@dhadfieldmenell) 's Twitter Profile Photo

Gary Marcus recently posted a thread responding to my critique of his public statements on AI. Though he didn't name me, it's clear it was directed at me. I want to clarify my position and explain why this isn't about us it's about trust in expertise and why it matters for AI

Khanh Nguyen (on job market) (@khanhxuannguyen) 's Twitter Profile Photo

Two reasons to NOT get over-hyped over test-time compute scaling: 1. The fact that we need to train on test distribution just shows that vanilla neural nets don't generalize well. The behaviorist approaches to training neural nets can't teach them to think. The mental search

Christopher Manning (@chrmanning) 's Twitter Profile Photo

Re: “Every major breakthrough in AI has been American”: America does itself no favors when it overestimates its specialness. Yes, the center of the AI industry is the US (California!), but many of the breakthroughs of (neural, gradient-based) AI happened elsewhere: • LSTMs,

Omar Khattab (@lateinteraction) 's Twitter Profile Photo

More Qwen. I'm increasingly comfortable saying these papers seem to be a discovery of some sort about Qwen models, not necessarily about reasoning.

Ben Plaut (@benplaut) 's Twitter Profile Photo

(1/5) New paper! Despite concerns about AI catastrophe, there isn’t much work on learning while provably avoiding catastrophe. In fact, nearly all of learning theory assumes all errors are reversible. Stuart Russell, Hanlin Zhu and I fill this gap: arxiv.org/pdf/2402.08062

leloy! (@leloykun) 's Twitter Profile Photo

I'm not sure if someone has already pointed this out, but Dr. GRPO still has a bias that is more pronounced the smaller the group size is. To make it unbiased, simply multiply Dr. GRPO's A_i by the correction term N/N-1. With this, you'll get LOOP (Leave-One-Out Proximal Policy

I'm not sure if someone has already pointed this out, but Dr. GRPO still has a bias that is more pronounced the smaller the group size is.

To make it unbiased, simply multiply Dr. GRPO's A_i by the correction term N/N-1. With this, you'll get LOOP (Leave-One-Out Proximal Policy
Hanze Dong @ ICLR 2025 (@hendrydong) 's Twitter Profile Photo

🤖What makes GRPO work? Rejection Sampling→Reinforce→GRPO - RS is underrated - Key of GRPO: implicitly remove prompts without correct answer - Reinforce+Filtering > GRPO (better KL) 💻github.com/RLHFlow/Minima… 📄arxiv.org/abs/2504.11343 👀RAFT was invited to ICLR25! Come & Chat☕️

Khanh Nguyen (on job market) (@khanhxuannguyen) 's Twitter Profile Photo

You might have heard that "LLMs are overconfident" and "LLMs know what they know" but these claims were only verified on a small set of models. We conduct rigorous experiments to confirm these claims on Q&A tasks for a diverse set of models. Yes, even GPT-4o is terribly

Alex Zhang (@a1zhang) 's Twitter Profile Photo

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

Chuong M. Huynh (@ryanhuynh1108) 's Twitter Profile Photo

CVPR-bound! ✈️ I'll be presenting CoLLM on Friday, 6/13 (Morning, #364) and looking for my next challenge as a full-time Scientist/Engineer. If you're hiring or just want to chat about exciting research, find me there! My work: hmchuong.github.io #CVPR2025 #JobHunt