Quentin Gallouédec (@qgallouedec) Twitter Tweets • TwiCopy

Questions! 🧐 LayerNorm always upcasts inputs to fp32 for stability (hardcoded). But the final multiplication by the weights is in the original dtype. 1. Why? Sometimes we do this multiplication in fp32. 2. When and why?

thumb_up_off_alt30

chat_bubble_outline3

repeat1

shareShare

Quentin Gallouédec

@qgallouedec

24 days ago

most of these don’t require AI, even for beginners github.com/huggingface/tr…

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Lewis Tunstall

@_lewtun

21 days ago

In the Smol Training Playbook, I tried to survey the state of popular post-training frameworks. Let me know if I missed any and I'll add them to the list!

thumb_up_off_alt195

chat_bubble_outline19

repeat16

shareShare

steven

@tu7uruu

20 days ago

Here is a tutorial on training LLaSA (LLaMA-based TTS) using GRPO to improve prosody, rhythm, and expressiveness in synthesized speech with TRL!

thumb_up_off_alt175

chat_bubble_outline10

repeat30

shareShare

TrackioApp

@trackioapp

19 days ago

Trackio 0.8.0 is out! You can now log tables as part your experiments with an intuitive syntax, let's GOOOO 🎯

thumb_up_off_alt11

chat_bubble_outline2

repeat5

shareShare

Muyu He

@hemuyu0327

18 days ago

On-policy distillation is powerful, but Thinking Machines's tinker only supports distilling from a teacher model within the same family, making it impossible for qwen to learn from deepseek, gpt-oss, etc. For the first time, we enabled model-agnostic distillations natively using

On-policy distillation is powerful, but <a href="/thinkymachines/">Thinking Machines</a>'s tinker only supports distilling from a teacher model within the same family, making it impossible for qwen to learn from deepseek, gpt-oss, etc.

For the first time, we enabled model-agnostic distillations natively using

thumb_up_off_alt284

chat_bubble_outline11

repeat23

shareShare

Quentin Gallouédec

@qgallouedec

17 days ago

how not to do a good job? 🫶

thumb_up_off_alt34

chat_bubble_outline4

repeat1

shareShare

will brown

@willccbb

9 days ago

Josh 2 months of oss work got me more interest from big labs than 4 ivy degrees over 9 years :)

thumb_up_off_alt487

chat_bubble_outline6

repeat8

shareShare

Benny (Yufei) Chen

@the_bunny_chen

4 days ago

Reinforcement Learning for agents has been held back by a lack of standard infrastructure. Production agents don't live in clean "gyms"—they live in messy, async environments. Today we’re open-sourcing Eval Protocol: a framework to run RL directly on your production agents. Day

thumb_up_off_alt59

chat_bubble_outline8

repeat22

shareShare