Banghua Zhu (@banghuaz) Twitter Tweets • TwiCopy

Banghua Zhu

@banghuaz

+ Follow

Incoming Assistant Professor @UW; Cofounder @NexusflowX. Post-training, evaluation, and agentic application of LLMs. Prior @Berkeley_EECS @Google @Microsoft

ID: 1028112376162205696

linkhttps://people.eecs.berkeley.edu/~banghua/ calendar_today11-08-2018 02:54:26

357 Tweet

2,2K Followers

906 Following

will brown

@willccbb

2 months ago

i’m much more inclined to say that the RL *system* inside OpenAI is AGI rather than than any fixed model checkpoint which comes out of it

thumb_up_off_alt832

chat_bubble_outline31

repeat51

shareShare

Here is an interesting antipattern in the code generated by any of the frontier LLMs. Unless instructed otherwise, they try to avoid failing with error even when this is clearly the right behavior. Here is one example. Prompt: Write a script to concatenate "input" and "output"

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

Banghua Zhu

@banghuaz

2 months ago

This is a very vivid and interesting example of reward hacking in RL even with verifiable reward (RLVR). In RLVR for coding, you always get reward 0 if the code execution gives error, and reward 1 if the code passes all test cases. Frontier models which have gone through heavy

thumb_up_off_alt22

chat_bubble_outline2

repeat4

shareShare

Yiping Lu

@2prime_pku

a month ago

Anyone knows adam?

thumb_up_off_alt3,3K

chat_bubble_outline208

repeat327

shareShare

Dmitry Rybin

@dmitryrybin1

a month ago

RL+LLM researchers actively use LLM distribution Entropy to measure training dynamics. This number is misleading. John Von-Neumann and Lev Landau gave us the correct answer 100 years ago while studying mixed quantum states in Hilbert spaces. Usual Entropy treats all tokens as

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat108

shareShare

Oleksii Kuchaiev

@kuchaev

a month ago

NeMo-RL team keeps shipping! v0.3.0 release adds DeepSeek's DeepSeek-V3 support as well as Qwen' Qwen 3 models. github.com/NVIDIA-NeMo/RL…

thumb_up_off_alt93

chat_bubble_outline2

repeat15

shareShare

Oleksii Kuchaiev

@kuchaev

a month ago

Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…

thumb_up_off_alt187

chat_bubble_outline8

repeat42

shareShare

Banghua Zhu

@banghuaz

a month ago

With just post-training, we’ve boosted the pruned Llama 3.3 model to outperform Qwen 3. Excited to see how much further large-scale post-training can take us!

thumb_up_off_alt86

chat_bubble_outline1

repeat5

shareShare

Artificial Analysis

@artificialanlys

a month ago

NVIDIA has released the latest member of its Nemotron language model family, Llama Nemotron Super (49B) v1.5, reaching a score of 64 on the Artificial Analysis Intelligence Index. The model is an evolution of Super 49B v1 from earlier this year, with advances from post-training

thumb_up_off_alt342

chat_bubble_outline13

repeat41

shareShare

MatthewBerman

@matthewberman

a month ago

Open Source is going crazy this month! New NVIDIA Nemotron model takes the top spot on the Artificial Analysis index!

Open Source is going crazy this month!

New NVIDIA Nemotron model takes the top spot on the <a href="/ArtificialAnlys/">Artificial Analysis</a> index!

thumb_up_off_alt185

chat_bubble_outline8

repeat15

shareShare

NVIDIA

@nvidia

a month ago

We are thrilled to achieve this milestone in AI reasoning models.

thumb_up_off_alt420

chat_bubble_outline30

repeat81

shareShare

LMSYS Org

@lmsysorg

a month ago

🚨Big News! We collaborated with NVIDIA to release a DeepSeek R1 inference container optimized for large scale deployment on GB200 NVL72, the world’s most advanced data center–scale accelerated computing platform. This docker container runs a single copy of the model across 56

🚨Big News! We collaborated with <a href="/nvidia/">NVIDIA</a> to release a DeepSeek R1 inference container optimized for large scale deployment on GB200 NVL72, the world’s most advanced data center–scale accelerated computing platform. This docker container runs a single copy of the model across 56

thumb_up_off_alt214

chat_bubble_outline3

repeat25

shareShare

Oleksii Kuchaiev

@kuchaev

a month ago

Everything about Llama-Nemotron-Super-V1.5 post-training is now open: Synthetic data: huggingface.co/datasets/nvidi… Human data: huggingface.co/datasets/nvidi… Reward models (trained on HS3 data): huggingface.co/collections/nv… RL toolkit: github.com/NVIDIA-NeMo/RL

thumb_up_off_alt253

chat_bubble_outline4

repeat49

shareShare

Banghua Zhu

@banghuaz

a month ago

That's exactly why I'm excited about the unique position of the post-training team at NVIDIA. We’re not just releasing open-weight models — we fully open source the data, code, and technical details. Small team, moving fast. The competition is fierce, and Chinese open model

thumb_up_off_alt383

chat_bubble_outline7

repeat27

shareShare

Matthew Leavitt

@leavittron

a month ago

post-training researchers

thumb_up_off_alt186

chat_bubble_outline5

repeat18

shareShare

martin_casado

@martin_casado

a month ago

It's just remarkable how many US startups are being built on Chinese OSS AI models. I'd say the majority that are building custom models via post training. The US needs to step up, make it a national priority and back with a huge investment.

thumb_up_off_alt592

chat_bubble_outline47

repeat50

shareShare

Yi Wu

@jxwuyi

a month ago

🔍We introduce ASearcher, a search agent trained by end2end RL Large-scale (up to 128 turns) RL with AReaL unlocks Long-Horizon Agentic Search (+20.8/+46.7% on GAIA/xBench) 💻Data, Code&Model: github.com/inclusionAI/AS… 📄Paper: arxiv.org/abs/2508.07976v #Agent #OpenSource #LLM #AGI

thumb_up_off_alt279

chat_bubble_outline2

repeat57

shareShare

Shuchao Bi

@shuchaobi

a month ago

I gave this talk at Harvard in June similar to the talk at Columbia during our east coast trip. I had a lot of fuds about deep learning and I shared my personal journey of resolving these fuds and slowly getting AGI-pilled over the last decade. Advancing the Frontier of Silicon

thumb_up_off_alt104

chat_bubble_outline11

repeat10

shareShare

Omar Sanseviero

@osanseviero

25 days ago

Introducing Gemma 3 270M 🔥 🤏A tiny model! Just 270 million parameters 🧠 Very strong instruction following 🤖 Fine-tune in just a few minutes, with a large vocabulary to serve as a high-quality foundation developers.googleblog.com/en/introducing…

thumb_up_off_alt2,2K

chat_bubble_outline119

repeat293

shareShare

Banghua Zhu

will brown

Igor Gitman

Banghua Zhu

Yiping Lu

Dmitry Rybin

Oleksii Kuchaiev

Oleksii Kuchaiev

Banghua Zhu

Artificial Analysis

MatthewBerman

NVIDIA

LMSYS Org

Oleksii Kuchaiev

Banghua Zhu

Matthew Leavitt

martin_casado

Yi Wu

Shuchao Bi

Omar Sanseviero