Yuchen Zhuang (@yuchen_zhuang) 's Twitter Profile
Yuchen Zhuang

@yuchen_zhuang

Ph.D. Candidate @MLatGT | Ex-Intern @AdobeResearch @Amazon | LLMs | LLMs for Reasoning and Planning | LLM Agent | Data-Centric AI

ID: 973203991231434752

linkhttp://night-chen.github.io calendar_today12-03-2018 14:28:07

94 Tweet

305 Followers

265 Following

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely

# RLHF is just barely RL

Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely
Jim Fan (@drjimfan) 's Twitter Profile Photo

This is the most gut-wrenching blog I've read, because it's so real and so close to heart. The author is no longer with us. I'm in tears. AI is not supposed to be 200B weights of stress and pain. It used to be a place of coffee-infused eureka moments, of exciting late-night arxiv

This is the most gut-wrenching blog I've read, because it's so real and so close to heart. The author is no longer with us. I'm in tears. AI is not supposed to be 200B weights of stress and pain. It used to be a place of coffee-infused eureka moments, of exciting late-night arxiv
Yuanqi Du (@yuanqid) 's Twitter Profile Photo

MolLEO is accepted ICLR 2026! We have made so much progress to show LLMs really have tons of knowledge about science and it’s not just retrieving. LLMs easily beat SOTA molecule optimization methods with an evolutionary process!

OpenAI Developers (@openaidevs) 's Twitter Profile Photo

We're launching new tools to help developers build reliable and powerful AI agents. 🤖🔧 Timestamps: 01:54 Web search 02:41 File search 03:22 Computer use 04:07 Responses API 10:17 Agents SDK

Rob Tang (@xiangrutang) 's Twitter Profile Photo

🧠 Excited to share our latest work: "MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning"! We've curated a challenging hard subset from existing medical QA datasets. We select questions where fewer than 50% of the LLMs (incl. GPT-4o,

🧠 Excited to share our latest work: "MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning"! We've curated a challenging hard subset from existing medical QA datasets. We select questions where fewer than 50% of the LLMs (incl. GPT-4o,
All Hands AI (@allhands_ai) 's Twitter Profile Photo

Today, we're excited to make two big announcements! - OpenHands LM: The strongest 32B coding agent model, resolving 37.4% of issues on SWE-bench Verified 📈 - OpenHands Cloud: SOTA open-source coding agents from your computer, phone, github, with $50 in free credits 🙌☁️

Today, we're excited to make two big announcements!

- OpenHands LM: The strongest 32B coding agent model, resolving 37.4% of issues on SWE-bench Verified 📈
- OpenHands Cloud: SOTA open-source coding agents from your computer, phone, github, with $50 in free credits 🙌☁️
OpenAI (@openai) 's Twitter Profile Photo

We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework. Agents must replicate top ICML 2024 papers, including understanding the paper, writing code, and executing experiments.

We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework.

Agents must replicate top ICML 2024 papers, including understanding the paper, writing code, and executing experiments.
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering A Gym-style framework for systematically training, evaluating, and improving agents in iterative ML engineering workflows

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

A Gym-style framework for systematically training, evaluating, and improving agents in iterative ML engineering workflows
Marktechpost AI Research News ⚡ (@marktechpost) 's Twitter Profile Photo

Georgia Tech and Stanford Researchers Introduce MLE-Dojo: A Gym-Style Framework Designed for Training, Evaluating, and Benchmarking Autonomous Machine Learning Engineering (MLE) Agents Researchers from Georgia Institute of Technology and Stanford University have introduced

Georgia Tech and Stanford Researchers Introduce MLE-Dojo: A Gym-Style Framework Designed for Training, Evaluating, and Benchmarking Autonomous Machine Learning Engineering (MLE) Agents

Researchers from Georgia Institute of Technology and Stanford University have introduced
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

Wenqi Shi (@wenqishi0106) 's Twitter Profile Photo

🤔 How can we systematically enhance LLMs for complex medical coding tasks? 🚀 Introducing MedAgentGym, an interactive gym-style platform designed specifically for training LLM agents in coding-based medical reasoning! 🧬💻 🎯 Comprehensive Code-based Medical Reasoning

🤔 How can we systematically enhance LLMs for complex medical coding tasks?

🚀 Introducing MedAgentGym, an interactive gym-style platform designed specifically for training LLM agents in coding-based medical reasoning! 🧬💻

🎯 Comprehensive Code-based Medical Reasoning
MedAI Group (@medaistanford) 's Twitter Profile Photo

This Monday, Wenqi Shi from UT Southwestern Medical Center will be joining us to talk about their work on training LLM agents for code-based medical reasoning at scale. Catch it at 1-2pm PT this Monday on Zoom! Subscribe to mailman.stanford.edu/mailman/listin… #ML #AI #medicine #healthcare

This Monday, Wenqi Shi from UT Southwestern Medical Center will be joining us to talk about their work on training LLM agents for code-based medical reasoning at scale. Catch it at 1-2pm PT this Monday on Zoom! Subscribe to mailman.stanford.edu/mailman/listin…  #ML #AI #medicine #healthcare
Rob Tang (@xiangrutang) 's Twitter Profile Photo

MedAgentGym is the first publicly available training environment designed to improve the ability of LLMs to use code for medical reasoning tasks. It includes 72,413 instances from 12 benchmarks. Tasks are run in isolated, executable environments that provide interactive feedback.

Deqing Fu (@deqingfu) 's Twitter Profile Photo

Presenting Zebra-CoT: A large-scale dataset to teach models intrinsic multimodal reasoning: interleaving text and natively-generated images like a zebra's stripes. It moves beyond the limitations of external tool-based visual CoT. 🔗arxiv.org/abs/2507.16746

Presenting Zebra-CoT: A large-scale dataset to teach models intrinsic multimodal reasoning: interleaving text and natively-generated images like a zebra's stripes. It moves beyond the limitations of external tool-based visual CoT.

🔗arxiv.org/abs/2507.16746
Yangsibo Huang (@yangsibohuang) 's Twitter Profile Photo

Gemini 2.5 Deep Think is available to Ultra users! It achieves SOTA on HLE (no tools), LiveCodeBench, and math/proofs. Time to give it a try and let us know your feedback! We’ve also made the IMO gold model available to mathematicians and other domain experts :)👩‍🍳