Nouamane Tazi (@nouamanetazi) 's Twitter Profile
Nouamane Tazi

@nouamanetazi

ML Research Engineer @huggingface 🤗. Scale it 'til you make it

🇵🇸🕊

ID: 570270035

calendar_today03-05-2012 20:40:05

300 Tweet

2,2K Followers

1,1K Following

Nathan (@nathanhabib1011) 's Twitter Profile Photo

🔥 Evaluating LLMs? You need Lighteval — the fastest, most flexible toolkit for benchmarking models, built by Hugging Face Now with: ✅ Plug & play custom model inference (evaluate any backend) 📈 Tasks like AIME, GPQA:diamond, SimpleQA, and hundreds more Details below 🧵👇

younes (@younesbelkada) 's Twitter Profile Photo

Announcing - Falcon-Edge – a series of powerful, universal and fine-tunable Bitnet models for everyone! We also release a Python fine-tuning toolkit library - `onebitllms`- specialized for Bitnet models. Announcement blogpost: falcon-lm.github.io/blog/falcon-ed…

Announcing - Falcon-Edge – a series of powerful, universal and fine-tunable Bitnet models for everyone!

We also release a Python fine-tuning toolkit library - `onebitllms`- specialized for Bitnet models.

Announcement blogpost: falcon-lm.github.io/blog/falcon-ed…
Nouamane Tazi (@nouamanetazi) 's Twitter Profile Photo

This is the *real* impact imo. Always a pleasure to hear such feedbacks. 🤗 Im also very excited abt scaling RL workloads...

Loubna Ben Allal (@loubnabenallal1) 's Twitter Profile Photo

Introducing SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (data, code, recipes) huggingface.co/blog/smollm3

Introducing SmolLM3: a strong, smol reasoner!

> SoTA 3B model
> dual mode reasoning (think/no_think)
> long context, up to 128k
> multilingual: en, fr, es, de, it, pt
> fully open source (data, code, recipes)

huggingface.co/blog/smollm3
elie (@eliebakouch) 's Twitter Profile Photo

Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >

Super excited to share SmolLM3, a new strong 3B model.

SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more!

> Train on 11T token on 384 H100 for 220k GPU hours
> Support long context up to 128k thanks to NoPE and intra document masking
>
elie (@eliebakouch) 's Twitter Profile Photo

We've just release 100+ intermediate checkpoints and our training logs from SmolLM3-3B training. We hope this can be useful to the researcher working on mech interpret, training dynamics, RL and other topics :) Training logs: -> Usual training loss (the gap in the loss are due

We've just release 100+ intermediate checkpoints and our training logs from SmolLM3-3B training. 

We hope this can be useful to the researcher working on mech interpret, training dynamics, RL and other topics :)

Training logs:
-> Usual training loss (the gap in the loss are due
Leandro von Werra (@lvwerra) 's Twitter Profile Photo

Excited to share the preview of the ultra-scale book! The past few months we worked with a graphic designer to bring the blogpost into a beautiful book format. The preview is available to all pro users! Stay tuned for the release of the physical book! hf.co/nanotron

Excited to share the preview of the ultra-scale book! 

The past few months we worked with a graphic designer to bring the blogpost into a beautiful book format. The preview is available to all pro users!

Stay tuned for the release of the physical book!

hf.co/nanotron
Julien Chaumond (@julien_c) 's Twitter Profile Photo

The Ultra-Scale Playbook (the large-scale LLM training guide from the Hugging Face science team) is out now!🔥 It is a 246-page, very nicely designed PDF that walks you through learning how to train your own DeepSeek-V3 model using: • 5D parallelism, • ZeRO, • fast kernels,

The Ultra-Scale Playbook (the large-scale LLM training guide from the <a href="/huggingface/">Hugging Face</a> science team) is out now!🔥

It is a 246-page, very nicely designed PDF that walks you through learning how to train your own DeepSeek-V3 model using:
• 5D parallelism, 
• ZeRO,
• fast kernels,
Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

Long-form AI reading is back and we’ve just dropped the ultimate summer read. Inspired by the likes of Stripe Press, we’re proud to announce the first book from HF Press: a carefully crafted, book-length PDF edition of the Ultra-Scale Playbook. Over 200 dense pages to learn the

Long-form AI reading is back and we’ve just dropped the ultimate summer read.

Inspired by the likes of Stripe Press, we’re proud to announce the first book from HF Press: a carefully crafted, book-length PDF edition of the Ultra-Scale Playbook.

Over 200 dense pages to learn the
clem 🤗 (@clementdelangue) 's Twitter Profile Photo

Every tech company can and should train their own deepseek R1, Llama or GPT5, just like every tech company writes their own code (and AI is no more than software 2.0). This is why we're releasing the Ultra-Scale Playbook. 200 pages to master: - 5D parallelism (DP, TP, PP, EP,

Every tech company can and should train their own deepseek R1, Llama or GPT5, just like every tech company writes their own code (and AI is no more than software 2.0).

This is why we're releasing the Ultra-Scale Playbook. 200 pages to master:
- 5D parallelism (DP, TP, PP, EP,
Mohamed (@mekkcyber) 's Twitter Profile Photo

The new GPT-OSS models are Mixture of Experts (MoEs), with 20B and 120B parameters. Since expert weights make up ~90% of the model, OpenAI decided to quantize them to 4 bits during post-training using the MXFP4 standard. Quantizing these to MXFP4 enables the larger model to

The new GPT-OSS models are Mixture of Experts (MoEs), with 20B and 120B parameters.

Since expert weights make up ~90% of the model, OpenAI decided to quantize them to 4 bits during post-training using the MXFP4 standard. 

Quantizing these to MXFP4 enables the larger model to
Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

A cool side effect of the gpt-oss release is that now you can load huge MoEs in seconds For those old enough to remember, this used to take >5' so literally a 100x improvement 🤯