Zhiding Yu (@zhidingyu) 's Twitter Profile
Zhiding Yu

@zhidingyu

Working to make machines understand the world like human beings.
Words are my own.

ID: 1283364577581756418

linkhttps://chrisding.github.io/ calendar_today15-07-2020 11:35:44

146 Tweet

7,7K Followers

494 Following

Min-Hung (Steve) Chen (@cmhungsteven) 's Twitter Profile Photo

The 4th Workshop on Transformers for Vision (T4V) at CVPR 2025 is soliciting self-nominations for reviewers. If you're interested, please fill out this form: forms.gle/cJKkywCyFAboct… More information can be found on our website: sites.google.com/view/t4v-cvpr2…

The 4th Workshop on Transformers for Vision (T4V) at CVPR 2025 is soliciting self-nominations for reviewers.
If you're interested, please fill out this form:
forms.gle/cJKkywCyFAboct…

More information can be found on our website: sites.google.com/view/t4v-cvpr2…
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Cool paper from NVIDIA Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization. Nemotron-Research-Tool-N1 uses rule-based reinforcement learning. It trains models with binary rewards evaluating only tool call structure

Cool paper from <a href="/nvidia/">NVIDIA</a>

Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization.

Nemotron-Research-Tool-N1 uses rule-based reinforcement learning.

It trains models with binary rewards evaluating only tool call structure
Zhiding Yu (@zhidingyu) 's Twitter Profile Photo

Check this super cool work done by our intern Shaokun Zhang - RL + Tool Using is the future of LLM Agent! Before joining NVIDIA, Shaokun was a contributor of the famous multi-agent workflow framework #AutoGen. Now, the age of agent learning is coming beyond workflow control!

Shizhe Diao (@shizhediao) 's Twitter Profile Photo

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough!

Introducing ProRL 😎, a novel training recipe that scales RL to &gt;2k steps, empowering the world’s leading 1.5B reasoning model💥and offering
Zhiding Yu (@zhidingyu) 's Twitter Profile Photo

Document and Enterprise Intelligence is arguably one of the most important applications of VLMs and cloud services. NVIDIA VLM technologies help to build commercial grade models excelling in this area. The Eagle VLM Team, together with other colleagues at NVIDIA, are proud to be

Zhiding Yu (@zhidingyu) 's Twitter Profile Photo

I did not notice this until just now. Thank you Andi Marafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.

Zhiding Yu (@zhidingyu) 's Twitter Profile Photo

And today we have just opened sourced the Eagle 2.5 model huggingface.co/nvidia/Eagle2.… You are welcome to download and give a try! We will also open source the fine-tuning code for Eagle 2/2.5 soon at github.com/NVlabs/Eagle. Stay tuned.

And today we have just opened sourced the Eagle 2.5 model
huggingface.co/nvidia/Eagle2.…
You are welcome to download and give a try!
We will also open source the fine-tuning code for Eagle 2/2.5 soon at github.com/NVlabs/Eagle. Stay tuned.
Fu-En (Fred) Yang (@fuenyang1) 's Twitter Profile Photo

🤖 How can we teach embodied agents to think before they act? 🚀 Introducing ThinkAct — a hierarchical Reasoning VLA framework with an MLLM for complex, slow reasoning and an action expert for fast, grounded execution. Slow think, fast act. 🧠⚡🤲

🤖 How can we teach embodied agents to think before they act?

🚀 Introducing ThinkAct — a hierarchical Reasoning VLA framework with an MLLM for complex, slow reasoning and an action expert for fast, grounded execution.
Slow think, fast act. 🧠⚡🤲
Shizhe Diao (@shizhediao) 's Twitter Profile Photo

New tech report out! 🚀 Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training An expanded version of our ProRL paper — now with more training insights and experimental details. Read it here 👉 arxiv.org/abs/2507.12507

Cihang Xie (@cihangxie) 's Twitter Profile Photo

🚀 Excited to share GPT-Image-Edit-1.5M — our new large-scale, high-quality, fully open image editing dataset for the research community! (1/n)

🚀 Excited to share GPT-Image-Edit-1.5M — our new large-scale, high-quality, fully open image editing dataset for the research community! (1/n)
Wonmin Byeon (@wonmin_byeon) 's Twitter Profile Photo

🚀 New paper: STORM — Efficient VLM for Long Video Understanding STORM cuts compute costs by up to 8× and reduces decoding latency by 2.4–2.9×, while achieving state-of-the-art performance. Details + paper link in the thread ↓

🚀 New paper: STORM — Efficient VLM for Long Video Understanding

STORM cuts compute costs by up to 8× and reduces decoding latency by 2.4–2.9×, while achieving state-of-the-art performance.

Details + paper link in the thread ↓
JingyuanLiu (@jingyuanliu123) 's Twitter Profile Photo

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes

Robert Youssef (@rryssf_) 's Twitter Profile Photo

Holy shit... Tencent researchers just killed fine-tuning AND reinforcement learning in one shot 😳 They call it Training-Free GRPO (Group Relative Policy Optimization). Instead of updating weights, the model literally learns from 'its own experiences' like an evolving memory

Holy shit... Tencent researchers just killed fine-tuning AND reinforcement learning in one shot 😳

They call it Training-Free GRPO (Group Relative Policy Optimization).

Instead of updating weights, the model literally learns from 'its own experiences' like an evolving memory
Zhiding Yu (@zhidingyu) 's Twitter Profile Photo

Here's a cool work from the Hugging Face Team. The research echos again that "Data is the Key" to produce frontier-level VLMs. Especially, diversity of the data matters. Happy to see more community efforts in open-sourcing VLM data. Check out more details👇

Zhiding Yu (@zhidingyu) 's Twitter Profile Photo

Very useful work! I'd like to try it as part of the offline data generation pipeline for spatial intelligence. It has become a general trend to lift things from 2D -> 3D, build a scene graph, and generate QAs in a scalable manner. Some relevant works from NVIDIA: