Zhiding Yu (@zhidingyu) Twitter Tweets • TwiCopy

Min-Hung (Steve) Chen

6 months ago

The 4th Workshop on Transformers for Vision (T4V) at CVPR 2025 is soliciting self-nominations for reviewers. If you're interested, please fill out this form: forms.gle/cJKkywCyFAboct… More information can be found on our website: sites.google.com/view/t4v-cvpr2…

thumb_up_off_alt26

chat_bubble_outline2

repeat11

shareShare

Zhiding Yu

@zhidingyu

6 months ago

“仗打完了，他们赚什么”

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Rohan Paul

@rohanpaul_ai

6 months ago

Cool paper from NVIDIA Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization. Nemotron-Research-Tool-N1 uses rule-based reinforcement learning. It trains models with binary rewards evaluating only tool call structure

Cool paper from <a href="/nvidia/">NVIDIA</a>

Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization.

Nemotron-Research-Tool-N1 uses rule-based reinforcement learning.

It trains models with binary rewards evaluating only tool call structure

thumb_up_off_alt200

chat_bubble_outline3

repeat44

shareShare

Zhiding Yu

@zhidingyu

6 months ago

Check this super cool work done by our intern Shaokun Zhang - RL + Tool Using is the future of LLM Agent! Before joining NVIDIA, Shaokun was a contributor of the famous multi-agent workflow framework #AutoGen. Now, the age of agent learning is coming beyond workflow control!

thumb_up_off_alt40

chat_bubble_outline1

repeat4

shareShare

Shizhe Diao

@shizhediao

5 months ago

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

thumb_up_off_alt382

chat_bubble_outline17

repeat64

shareShare

Zhiding Yu

@zhidingyu

5 months ago

Document and Enterprise Intelligence is arguably one of the most important applications of VLMs and cloud services. NVIDIA VLM technologies help to build commercial grade models excelling in this area. The Eagle VLM Team, together with other colleagues at NVIDIA, are proud to be

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Zhiding Yu

@zhidingyu

5 months ago

Come to the T4V Workshop this Thursday (June 12th) and check the latest development in Transformers!

thumb_up_off_alt18

chat_bubble_outline0

repeat2

shareShare

Zhiding Yu

@zhidingyu

3 months ago

I did not notice this until just now. Thank you Andi Marafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.

thumb_up_off_alt18

chat_bubble_outline1

repeat3

shareShare

Zhiding Yu

@zhidingyu

3 months ago

And today we have just opened sourced the Eagle 2.5 model huggingface.co/nvidia/Eagle2.… You are welcome to download and give a try! We will also open source the fine-tuning code for Eagle 2/2.5 soon at github.com/NVlabs/Eagle. Stay tuned.

thumb_up_off_alt47

chat_bubble_outline1

repeat6

shareShare

Fu-En (Fred) Yang

@fuenyang1

3 months ago

🤖 How can we teach embodied agents to think before they act? 🚀 Introducing ThinkAct — a hierarchical Reasoning VLA framework with an MLLM for complex, slow reasoning and an action expert for fast, grounded execution. Slow think, fast act. 🧠⚡🤲

thumb_up_off_alt91

chat_bubble_outline3

repeat24

shareShare

Shizhe Diao

@shizhediao

3 months ago

New tech report out! 🚀 Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training An expanded version of our ProRL paper — now with more training insights and experimental details. Read it here 👉 arxiv.org/abs/2507.12507

thumb_up_off_alt110

chat_bubble_outline2

repeat15

shareShare

Cihang Xie

@cihangxie

3 months ago

🚀 Excited to share GPT-Image-Edit-1.5M — our new large-scale, high-quality, fully open image editing dataset for the research community! (1/n)

thumb_up_off_alt217

chat_bubble_outline3

repeat49

shareShare

Wonmin Byeon

@wonmin_byeon

2 months ago

🚀 New paper: STORM — Efficient VLM for Long Video Understanding STORM cuts compute costs by up to 8× and reduces decoding latency by 2.4–2.9×, while achieving state-of-the-art performance. Details + paper link in the thread ↓

thumb_up_off_alt122

chat_bubble_outline4

repeat25

shareShare

JingyuanLiu

@jingyuanliu123

2 months ago

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes

thumb_up_off_alt3,3K

chat_bubble_outline59

repeat343

shareShare

Robert Youssef

@rryssf_

14 days ago

Holy shit... Tencent researchers just killed fine-tuning AND reinforcement learning in one shot 😳 They call it Training-Free GRPO (Group Relative Policy Optimization). Instead of updating weights, the model literally learns from 'its own experiences' like an evolving memory

thumb_up_off_alt2,2K

chat_bubble_outline113

repeat501

shareShare

Zhiding Yu

@zhidingyu

8 days ago

Here's a cool work from the Hugging Face Team. The research echos again that "Data is the Key" to produce frontier-level VLMs. Especially, diversity of the data matters. Happy to see more community efforts in open-sourcing VLM data. Check out more details👇

thumb_up_off_alt84

chat_bubble_outline3

repeat11

shareShare

Zhiding Yu

@zhidingyu

7 days ago

Very useful work! I'd like to try it as part of the offline data generation pipeline for spatial intelligence. It has become a general trend to lift things from 2D -> 3D, build a scene graph, and generate QAs in a scalable manner. Some relevant works from NVIDIA:

thumb_up_off_alt22

chat_bubble_outline1

repeat1

shareShare