Nouamane Tazi (@nouamanetazi) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🔥 Evaluating LLMs? You need Lighteval — the fastest, most flexible toolkit for benchmarking models, built by Hugging Face Now with: ✅ Plug & play custom model inference (evaluate any backend) 📈 Tasks like AIME, GPQA:diamond, SimpleQA, and hundreds more Details below 🧵👇

thumb_up_off_alt59

chat_bubble_outline5

repeat14

shareShare

younes

@younesbelkada

3 months ago

Announcing - Falcon-Edge – a series of powerful, universal and fine-tunable Bitnet models for everyone! We also release a Python fine-tuning toolkit library - `onebitllms`- specialized for Bitnet models. Announcement blogpost: falcon-lm.github.io/blog/falcon-ed…

thumb_up_off_alt224

chat_bubble_outline10

repeat65

shareShare

Nouamane Tazi

@nouamanetazi

3 months ago

This is the *real* impact imo. Always a pleasure to hear such feedbacks. 🤗 Im also very excited abt scaling RL workloads...

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Nouamane Tazi

@nouamanetazi

a month ago

It's out finallyyyy👌🏻

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Loubna Ben Allal

@loubnabenallal1

a month ago

Introducing SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (data, code, recipes) huggingface.co/blog/smollm3

thumb_up_off_alt998

chat_bubble_outline66

repeat195

shareShare

elie

@eliebakouch

a month ago

Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >

thumb_up_off_alt615

chat_bubble_outline73

repeat106

shareShare

elie

@eliebakouch

17 days ago

We've just release 100+ intermediate checkpoints and our training logs from SmolLM3-3B training. We hope this can be useful to the researcher working on mech interpret, training dynamics, RL and other topics :) Training logs: -> Usual training loss (the gap in the loss are due

thumb_up_off_alt400

chat_bubble_outline13

repeat59

shareShare

Leandro von Werra

@lvwerra

7 days ago

Excited to share the preview of the ultra-scale book! The past few months we worked with a graphic designer to bring the blogpost into a beautiful book format. The preview is available to all pro users! Stay tuned for the release of the physical book! hf.co/nanotron

thumb_up_off_alt189

chat_bubble_outline4

repeat21

shareShare

Julien Chaumond

@julien_c

6 days ago

The Ultra-Scale Playbook (the large-scale LLM training guide from the Hugging Face science team) is out now!🔥 It is a 246-page, very nicely designed PDF that walks you through learning how to train your own DeepSeek-V3 model using: • 5D parallelism, • ZeRO, • fast kernels,

The Ultra-Scale Playbook (the large-scale LLM training guide from the <a href="/huggingface/">Hugging Face</a> science team) is out now!🔥

It is a 246-page, very nicely designed PDF that walks you through learning how to train your own DeepSeek-V3 model using:
• 5D parallelism,
• ZeRO,
• fast kernels,

thumb_up_off_alt693

chat_bubble_outline15

repeat124

shareShare

Thomas Wolf

@thom_wolf

5 days ago

Long-form AI reading is back and we’ve just dropped the ultimate summer read. Inspired by the likes of Stripe Press, we’re proud to announce the first book from HF Press: a carefully crafted, book-length PDF edition of the Ultra-Scale Playbook. Over 200 dense pages to learn the

thumb_up_off_alt468

chat_bubble_outline20

repeat62

shareShare

clem 🤗

@clementdelangue

4 days ago

Every tech company can and should train their own deepseek R1, Llama or GPT5, just like every tech company writes their own code (and AI is no more than software 2.0). This is why we're releasing the Ultra-Scale Playbook. 200 pages to master: - 5D parallelism (DP, TP, PP, EP,

thumb_up_off_alt2,2K

chat_bubble_outline52

repeat279

shareShare

Mohamed

@mekkcyber

2 days ago

The new GPT-OSS models are Mixture of Experts (MoEs), with 20B and 120B parameters. Since expert weights make up ~90% of the model, OpenAI decided to quantize them to 4 bits during post-training using the MXFP4 standard. Quantizing these to MXFP4 enables the larger model to

thumb_up_off_alt87

chat_bubble_outline1

repeat22

shareShare

Lewis Tunstall

@_lewtun

a day ago

A cool side effect of the gpt-oss release is that now you can load huge MoEs in seconds For those old enough to remember, this used to take >5' so literally a 100x improvement 🤯

thumb_up_off_alt54

chat_bubble_outline0

repeat5

shareShare

clem 🤗

@clementdelangue

15 hours ago

Light night reading hf.co/nanotron

thumb_up_off_alt127

chat_bubble_outline5

repeat12

shareShare

Nouamane Tazi

Gate.io

Nathan

younes

Nouamane Tazi

Nouamane Tazi

Loubna Ben Allal

elie

elie

Leandro von Werra

Julien Chaumond

Thomas Wolf

clem 🤗

Mohamed

Lewis Tunstall

clem 🤗