Gabriel Synnaeve (@syhw) 's Twitter Profile
Gabriel Synnaeve

@syhw

Nerd & Dad
syhw.bsky.social

ID: 79440047

linkhttps://syhw.github.io/ calendar_today03-10-2009 11:33:30

8,8K Tweet

15,15K Followers

1,1K Following

Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude* The first half of the essay reads like a lengthy attempt to justify that closed-source

Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

We release a paper with a beautiful theorectial guarantee that the loss gradient in the RLHF framework is aligned with the underlying true return! It comes with a simple tweak on sampling scheme in practice; can be plug-and-play to any online setting. It's "Online Off-Policy"!

We release a paper with a beautiful theorectial guarantee that the loss gradient in the RLHF framework is aligned with the underlying true return!

It comes with a simple tweak on sampling scheme in practice; can be plug-and-play to any online setting. It's "Online Off-Policy"!
Gabriel Synnaeve (@syhw) 's Twitter Profile Photo

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution arxiv.org/abs/2502.18449 by Yuxiang Wei Sida Wang and the whole team! Get started with your favorite model here github.com/facebookresear…

Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

Does your LLM truly comprehend the complexity of the code it generates? 🥰   Introducing our new non-saturated (for at least the coming week? 😉) benchmark:   ✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?   Check out the details below !👇

Does your LLM truly comprehend the complexity of the code it generates? 🥰
 
Introducing our new non-saturated (for at least the coming week? 😉) benchmark:
 
✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?
 
Check out the details below !👇
Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

📸 Quick snapshot of our results ! 🏅   BigO(Bench) evaluates high-level reasoning skills in coding, revealing that top-scoring models on Code Contests often struggle when required to both write and reason about their code.   Extra-Space Complexity seems particularly challenging!

📸 Quick snapshot of our results ! 🏅
 
BigO(Bench) evaluates high-level reasoning skills in coding, revealing that top-scoring models on Code Contests often struggle when required to both write and reason about their code.
 
Extra-Space Complexity seems particularly challenging!
Ori Yoran (@oriyoran) 's Twitter Profile Photo

New #ICLR2024 paper! The KoLMogorov Test: can CodeLMs compress data by code generation? The optimal compression for a sequence is the shortest program that generates it. Empirically, LMs struggle even on simple sequences, but can be trained to outperform current methods! 🧵1/7

Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

Introducing our first set of Llama 4 models!

We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨

That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing.

You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time.

🧵 How?
Michael Hassid (@michaelhassid) 's Twitter Profile Photo

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

The longer reasoning LLM thinks - the more likely to be correct, right?

Apparently not.

Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”.

Link: arxiv.org/abs/2505.17813

1/n
Jakob Foerster (@j_foerst) 's Twitter Profile Photo

Hello World: My team at FAIR / AI at Meta (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1FAI…

Gabriel Synnaeve (@syhw) 's Twitter Profile Photo

The code *is* the chain-of-thought for programming. So successive iterations over the code *are* the reasoning tokens for programming.