Yoshi Suhara (@suhara) Twitter Tweets • TwiCopy

Pavlo Molchanov

8 months ago

🔥 Vision encoder upgrade: RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations, CVPR2025. Current foundation models have too many limitations: i) tailored for a single task, ii) not flexible on resolution (like

thumb_up_off_alt670

chat_bubble_outline14

repeat135

shareShare

Soumye Singhal

@soumyesinghal

8 months ago

⚡⚡ Llama-Nemotron-Ultra-253B just dropped: our most advanced open reasoning model 🧵👇

thumb_up_off_alt44

chat_bubble_outline3

repeat13

shareShare

Oleksii Kuchaiev

@kuchaev

8 months ago

We are excited to release Llama-Nemotron-Ultra! This is a reasoning ON/OFF, dense 253B model. Open weights and post-training data. huggingface.co/nvidia/Llama-3… We started with llama-405B, changed it via NAS pruning then followed by reasoning-focused post-training: SFT + RL in FP8.

thumb_up_off_alt694

chat_bubble_outline24

repeat125

shareShare

Oleksii Kuchaiev

@kuchaev

8 months ago

We just updated Llama-Nemotron post training dataset with additional 2.2M math and 500K code reasoning examples used in Llama-Nemotron-Ultra training huggingface.co/datasets/nvidi…

thumb_up_off_alt158

chat_bubble_outline3

repeat32

shareShare

Mostofa Patwary

@mapatwary

7 months ago

Nemotron-H base models (8B/47B/56B): A family of Hybrid Mamba-Transformer LLMs are now available on HuggingFace: huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… Technical Report: arxiv.org/abs/2504.03624 Blog: research.nvidia.com/labs/adlr/nemo…

thumb_up_off_alt27

chat_bubble_outline1

repeat12

shareShare

Pavlo Molchanov

@pavlomolchanov

7 months ago

New efficient Hybrid LLMs from @NVIDIA: Nemotron-H! Introducing a family of models combining Mamba-2, Self-Attention & FFNs for 8B, 47B and 56B sizes. • 3x faster and 1.5x smaller 47B model is on par with Qwen-72B and Llama-70B • 1.8x faster Hybrid 8B than transformers

thumb_up_off_alt309

chat_bubble_outline9

repeat90

shareShare

Pavlo Molchanov

@pavlomolchanov

7 months ago

🚀 Can Hybrid LLMs be compressed and distilled to reach SOTA? 📉 We compress Nemotron-H-8B down to 4B using just 400B tokens - achieving a model that’s 2.2x faster and +2.6% better than Phi-4. 📄 Details: arxiv.org/pdf/2504.11409 🧠 Hybrid SSM compression isn't easy: ➤ Minitron

thumb_up_off_alt52

chat_bubble_outline3

repeat41

shareShare

Yoshi Suhara

@suhara

7 months ago

It's happening now! Please check the poster presentation at Session K: Oral/Poster 8 if you're at #NAACL2025!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Oleksii Kuchaiev

@kuchaev

7 months ago

Llama-Nemotron-v1 technical report is now available on arxiv arxiv.org/pdf/2505.00949…

thumb_up_off_alt348

chat_bubble_outline3

repeat64

shareShare

Aran Komatsuzaki

@arankomatsuzaki

7 months ago

Nvidia presents Llama-Nemotron: Efficient Reasoning Models - An open family of models w/ exceptional reasoning capabilities and inference efficiency - Discusses the training procedure, incl. NAS from Llama 3 for accelerated inference, knowledge distillation, and continued

thumb_up_off_alt300

chat_bubble_outline2

repeat48

shareShare

Shaokun Zhang

@shaokunzhang1

6 months ago

Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: arxiv.org/pdf/2505.00024 💻

thumb_up_off_alt355

chat_bubble_outline2

repeat94

shareShare

Oleksii Kuchaiev

@kuchaev

6 months ago

NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try github.com/NVIDIA/NeMo-RL

thumb_up_off_alt393

chat_bubble_outline4

repeat65

shareShare

Yoshi Suhara

@suhara

6 months ago

Llama Nemotron Nano 4B matches the previous Nano 8B accuracy with only half the size! 🤗 huggingface.co/nvidia/Llama-3…

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Yoshi Suhara

@suhara

6 months ago

A new video game benchmark for LLM agents, designed across various game titles! Happy to be part of this wonderful collaboration with Dongmin Park and the amazing team KRAFTON AI!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare