Yoshi Suhara (@suhara) 's Twitter Profile
Yoshi Suhara

@suhara

Building Small & Large Language Models @nvidia

ID: 7062832

linkhttps://yoshi-suhara.com calendar_today25-06-2007 05:20:01

163 Tweet

322 Followers

282 Following

Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🔥 Vision encoder upgrade: RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations, CVPR2025. Current foundation models have too many limitations: i) tailored for a single task, ii) not flexible on resolution (like

🔥 Vision encoder upgrade: RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations, CVPR2025.

Current foundation models have too many limitations: i) tailored for a single task, ii) not flexible on resolution (like
Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

We are excited to release Llama-Nemotron-Ultra! This is a reasoning ON/OFF, dense 253B model. Open weights and post-training data. huggingface.co/nvidia/Llama-3… We started with llama-405B, changed it via NAS pruning then followed by reasoning-focused post-training: SFT + RL in FP8.

We are excited to release Llama-Nemotron-Ultra! This is a reasoning ON/OFF, dense 253B model. Open weights and post-training data. huggingface.co/nvidia/Llama-3… We started with llama-405B, changed it via NAS pruning then followed by reasoning-focused post-training: SFT + RL in FP8.
Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

We just updated Llama-Nemotron post training dataset with additional 2.2M math and 500K code reasoning examples used in Llama-Nemotron-Ultra training huggingface.co/datasets/nvidi…

Mostofa Patwary (@mapatwary) 's Twitter Profile Photo

Nemotron-H base models (8B/47B/56B): A family of Hybrid Mamba-Transformer LLMs are now available on HuggingFace: huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… Technical Report: arxiv.org/abs/2504.03624 Blog: research.nvidia.com/labs/adlr/nemo…

Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

New efficient Hybrid LLMs from @NVIDIA: Nemotron-H! Introducing a family of models combining Mamba-2, Self-Attention & FFNs for 8B, 47B and 56B sizes. • 3x faster and 1.5x smaller 47B model is on par with Qwen-72B and Llama-70B • 1.8x faster Hybrid 8B than transformers

New efficient Hybrid LLMs from @NVIDIA: Nemotron-H! Introducing a family of models combining Mamba-2, Self-Attention & FFNs for 8B, 47B and 56B sizes.

• 3x faster and 1.5x smaller 47B model is on par with Qwen-72B and Llama-70B
• 1.8x faster Hybrid 8B than transformers
Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🚀 Can Hybrid LLMs be compressed and distilled to reach SOTA? 📉 We compress Nemotron-H-8B down to 4B using just 400B tokens - achieving a model that’s 2.2x faster and +2.6% better than Phi-4. 📄 Details: arxiv.org/pdf/2504.11409 🧠 Hybrid SSM compression isn't easy: ➤ Minitron

🚀 Can Hybrid LLMs be compressed and distilled to reach SOTA?
📉 We compress Nemotron-H-8B down to 4B using just 400B tokens - achieving a model that’s 2.2x faster and +2.6% better than Phi-4.

📄 Details: arxiv.org/pdf/2504.11409

🧠 Hybrid SSM compression isn't easy:
➤ Minitron
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Nvidia presents Llama-Nemotron: Efficient Reasoning Models - An open family of models w/ exceptional reasoning capabilities and inference efficiency - Discusses the training procedure, incl. NAS from Llama 3 for accelerated inference, knowledge distillation, and continued

Nvidia presents Llama-Nemotron: Efficient Reasoning Models

- An open family of models w/ exceptional reasoning capabilities and inference efficiency

- Discusses the training procedure, incl. NAS from Llama 3 for accelerated inference, knowledge distillation, and continued
Shaokun Zhang (@shaokunzhang1) 's Twitter Profile Photo

Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: arxiv.org/pdf/2505.00024 💻

Tool-using LLMs can learn to reason—without reasoning traces.

🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation.

📄 Paper: arxiv.org/pdf/2505.00024
💻
Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try github.com/NVIDIA/NeMo-RL

Yoshi Suhara (@suhara) 's Twitter Profile Photo

Llama Nemotron Nano 4B matches the previous Nano 8B accuracy with only half the size! 🤗 huggingface.co/nvidia/Llama-3…

Yoshi Suhara (@suhara) 's Twitter Profile Photo

A new video game benchmark for LLM agents, designed across various game titles! Happy to be part of this wonderful collaboration with Dongmin Park and the amazing team KRAFTON AI!