Banghua Zhu (@banghuaz) 's Twitter Profile
Banghua Zhu

@banghuaz

Incoming Assistant Professor @UW; Cofounder @NexusflowX. Post-training, evaluation, and agentic application of LLMs. Prior @Berkeley_EECS @Google @Microsoft

ID: 1028112376162205696

linkhttps://people.eecs.berkeley.edu/~banghua/ calendar_today11-08-2018 02:54:26

357 Tweet

2,2K Followers

906 Following

will brown (@willccbb) 's Twitter Profile Photo

i’m much more inclined to say that the RL *system* inside OpenAI is AGI rather than than any fixed model checkpoint which comes out of it

Igor Gitman (@igtmn) 's Twitter Profile Photo

Here is an interesting antipattern in the code generated by any of the frontier LLMs. Unless instructed otherwise, they try to avoid failing with error even when this is clearly the right behavior. Here is one example. Prompt: Write a script to concatenate "input" and "output"

Here is an interesting antipattern in the code generated by any of the frontier LLMs. Unless instructed otherwise, they try to avoid failing with error even when this is clearly the right behavior. Here is one example. 

Prompt: Write a script to concatenate "input" and "output"
Banghua Zhu (@banghuaz) 's Twitter Profile Photo

This is a very vivid and interesting example of reward hacking in RL even with verifiable reward (RLVR). In RLVR for coding, you always get reward 0 if the code execution gives error, and reward 1 if the code passes all test cases. Frontier models which have gone through heavy

Dmitry Rybin (@dmitryrybin1) 's Twitter Profile Photo

RL+LLM researchers actively use LLM distribution Entropy to measure training dynamics. This number is misleading. John Von-Neumann and Lev Landau gave us the correct answer 100 years ago while studying mixed quantum states in Hilbert spaces. Usual Entropy treats all tokens as

RL+LLM researchers actively use LLM distribution Entropy to measure training dynamics. This number is misleading.

John Von-Neumann and Lev Landau gave us the correct answer 100 years ago while studying mixed quantum states in Hilbert spaces.

Usual Entropy treats all tokens as
Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…

Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…
Banghua Zhu (@banghuaz) 's Twitter Profile Photo

With just post-training, we’ve boosted the pruned Llama 3.3 model to outperform Qwen 3. Excited to see how much further large-scale post-training can take us!

Artificial Analysis (@artificialanlys) 's Twitter Profile Photo

NVIDIA has released the latest member of its Nemotron language model family, Llama Nemotron Super (49B) v1.5, reaching a score of 64 on the Artificial Analysis Intelligence Index. The model is an evolution of Super 49B v1 from earlier this year, with advances from post-training

NVIDIA has released the latest member of its Nemotron language model family, Llama Nemotron Super (49B) v1.5, reaching a score of 64 on the Artificial Analysis Intelligence Index.

The model is an evolution of Super 49B v1 from earlier this year, with advances from post-training
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

🚨Big News! We collaborated with NVIDIA to release a DeepSeek R1 inference container optimized for large scale deployment on GB200 NVL72, the world’s most advanced data center–scale accelerated computing platform. This docker container runs a single copy of the model across 56

🚨Big News! We collaborated with <a href="/nvidia/">NVIDIA</a> to release a DeepSeek R1 inference container optimized for large scale deployment on GB200 NVL72, the world’s most advanced data center–scale accelerated computing platform. This docker container runs a single copy of the model across 56
Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

Everything about Llama-Nemotron-Super-V1.5 post-training is now open: Synthetic data: huggingface.co/datasets/nvidi… Human data: huggingface.co/datasets/nvidi… Reward models (trained on HS3 data): huggingface.co/collections/nv… RL toolkit: github.com/NVIDIA-NeMo/RL

Banghua Zhu (@banghuaz) 's Twitter Profile Photo

That's exactly why I'm excited about the unique position of the post-training team at NVIDIA. We’re not just releasing open-weight models — we fully open source the data, code, and technical details. Small team, moving fast. The competition is fierce, and Chinese open model

martin_casado (@martin_casado) 's Twitter Profile Photo

It's just remarkable how many US startups are being built on Chinese OSS AI models. I'd say the majority that are building custom models via post training. The US needs to step up, make it a national priority and back with a huge investment.

Yi Wu (@jxwuyi) 's Twitter Profile Photo

🔍We introduce ASearcher, a search agent trained by end2end RL Large-scale (up to 128 turns) RL with AReaL unlocks Long-Horizon Agentic Search (+20.8/+46.7% on GAIA/xBench) 💻Data, Code&Model: github.com/inclusionAI/AS… 📄Paper: arxiv.org/abs/2508.07976v #Agent #OpenSource #LLM #AGI

🔍We introduce ASearcher, a search agent trained by end2end RL
Large-scale (up to 128 turns) RL with AReaL unlocks Long-Horizon Agentic Search (+20.8/+46.7% on GAIA/xBench)
💻Data, Code&amp;Model: github.com/inclusionAI/AS…
📄Paper:  arxiv.org/abs/2508.07976v
#Agent #OpenSource #LLM #AGI
Shuchao Bi (@shuchaobi) 's Twitter Profile Photo

I gave this talk at Harvard in June similar to the talk at Columbia during our east coast trip. I had a lot of fuds about deep learning and I shared my personal journey of resolving these fuds and slowly getting AGI-pilled over the last decade. Advancing the Frontier of Silicon

Omar Sanseviero (@osanseviero) 's Twitter Profile Photo

Introducing Gemma 3 270M 🔥 🤏A tiny model! Just 270 million parameters 🧠 Very strong instruction following 🤖 Fine-tune in just a few minutes, with a large vocabulary to serve as a high-quality foundation developers.googleblog.com/en/introducing…

Introducing Gemma 3 270M 🔥

🤏A tiny model! Just 270 million parameters
🧠 Very strong instruction following
🤖 Fine-tune in just a few minutes, with a large vocabulary to serve as a high-quality foundation

developers.googleblog.com/en/introducing…