Yuchen Zhuang (@yuchen_zhuang) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely

thumb_up_off_alt8,8K

chat_bubble_outline407

repeat1,1K

shareShare

Jim Fan

@drjimfan

7 months ago

This is the most gut-wrenching blog I've read, because it's so real and so close to heart. The author is no longer with us. I'm in tears. AI is not supposed to be 200B weights of stress and pain. It used to be a place of coffee-infused eureka moments, of exciting late-night arxiv

thumb_up_off_alt3,3K

chat_bubble_outline93

repeat401

shareShare

Yuanqi Du

@yuanqid

6 months ago

MolLEO is accepted ICLR 2026! We have made so much progress to show LLMs really have tons of knowledge about science and it’s not just retrieving. LLMs easily beat SOTA molecule optimization methods with an evolutionary process!

thumb_up_off_alt39

chat_bubble_outline1

repeat5

shareShare

OpenAI Developers

@openaidevs

5 months ago

We're launching new tools to help developers build reliable and powerful AI agents. 🤖🔧 Timestamps: 01:54 Web search 02:41 File search 03:22 Computer use 04:07 Responses API 10:17 Agents SDK

thumb_up_off_alt4,4K

chat_bubble_outline273

repeat900

shareShare

Rob Tang

@xiangrutang

4 months ago

🧠 Excited to share our latest work: "MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning"! We've curated a challenging hard subset from existing medical QA datasets. We select questions where fewer than 50% of the LLMs (incl. GPT-4o,

thumb_up_off_alt59

chat_bubble_outline1

repeat18

shareShare

All Hands AI

@allhands_ai

4 months ago

Today, we're excited to make two big announcements! - OpenHands LM: The strongest 32B coding agent model, resolving 37.4% of issues on SWE-bench Verified 📈 - OpenHands Cloud: SOTA open-source coding agents from your computer, phone, github, with $50 in free credits 🙌☁️

thumb_up_off_alt881

chat_bubble_outline36

repeat128

shareShare

OpenAI

@openai

4 months ago

We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework. Agents must replicate top ICML 2024 papers, including understanding the paper, writing code, and executing experiments.

thumb_up_off_alt7,7K

chat_bubble_outline221

repeat1,1K

shareShare

Yuchen Zhuang

@yuchen_zhuang

4 months ago

This is so cool!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Aran Komatsuzaki

@arankomatsuzaki

3 months ago

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering A Gym-style framework for systematically training, evaluating, and improving agents in iterative ML engineering workflows

thumb_up_off_alt202

chat_bubble_outline4

repeat51

shareShare

Marktechpost AI Research News ⚡

@marktechpost

3 months ago

Georgia Tech and Stanford Researchers Introduce MLE-Dojo: A Gym-Style Framework Designed for Training, Evaluating, and Benchmarking Autonomous Machine Learning Engineering (MLE) Agents Researchers from Georgia Institute of Technology and Stanford University have introduced

thumb_up_off_alt20

chat_bubble_outline0

repeat10

shareShare

Google DeepMind

@googledeepmind

2 months ago

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

thumb_up_off_alt4,4K

chat_bubble_outline85

repeat663

shareShare

Wenqi Shi

@wenqishi0106

2 months ago

🤔 How can we systematically enhance LLMs for complex medical coding tasks? 🚀 Introducing MedAgentGym, an interactive gym-style platform designed specifically for training LLM agents in coding-based medical reasoning! 🧬💻 🎯 Comprehensive Code-based Medical Reasoning

thumb_up_off_alt161

chat_bubble_outline9

repeat20

shareShare

Yuchen Zhuang

@yuchen_zhuang

2 months ago

Excited to share that I joined Google DeepMind as a research scientist recently 🥳 Looking forward to future collaborations on exciting projects :)

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat11

shareShare

MedAI Group

@medaistanford

2 months ago

This Monday, Wenqi Shi from UT Southwestern Medical Center will be joining us to talk about their work on training LLM agents for code-based medical reasoning at scale. Catch it at 1-2pm PT this Monday on Zoom! Subscribe to mailman.stanford.edu/mailman/listin… #ML #AI #medicine #healthcare

thumb_up_off_alt26

chat_bubble_outline0

repeat9

shareShare

Rob Tang

@xiangrutang

a month ago

MedAgentGym is the first publicly available training environment designed to improve the ability of LLMs to use code for medical reasoning tasks. It includes 72,413 instances from 12 benchmarks. Tasks are run in isolated, executable environments that provide interactive feedback.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Yangsibo Huang

@yangsibohuang

14 days ago

Gemini + Deep Think won IMO gold this year 🏅 super honored to be part of this dream team!

thumb_up_off_alt353

chat_bubble_outline3

repeat8

shareShare

Deqing Fu

@deqingfu

12 days ago

Presenting Zebra-CoT: A large-scale dataset to teach models intrinsic multimodal reasoning: interleaving text and natively-generated images like a zebra's stripes. It moves beyond the limitations of external tool-based visual CoT. 🔗arxiv.org/abs/2507.16746

thumb_up_off_alt57

chat_bubble_outline0

repeat8

shareShare

Yangsibo Huang

@yangsibohuang

3 days ago

Gemini 2.5 Deep Think is available to Ultra users! It achieves SOTA on HLE (no tools), LiveCodeBench, and math/proofs. Time to give it a try and let us know your feedback! We’ve also made the IMO gold model available to mathematicians and other domain experts :)👩‍🍳

thumb_up_off_alt23

chat_bubble_outline0

repeat1

shareShare

Yuchen Zhuang

@yuchen_zhuang

3 days ago

Gemini 2.5 Deep Think is available to Ultra users!🥳

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Yuchen Zhuang

Gate.io

Andrej Karpathy

Jim Fan

Yuanqi Du

OpenAI Developers

Rob Tang

All Hands AI

OpenAI

Yuchen Zhuang

Aran Komatsuzaki

Marktechpost AI Research News ⚡

Google DeepMind

Wenqi Shi

Yuchen Zhuang

MedAI Group

Rob Tang

Yangsibo Huang

Deqing Fu

Yangsibo Huang

Yuchen Zhuang