Gabriel Synnaeve (@syhw) Twitter Tweets • TwiCopy

Thomas Wolf

9 months ago

Finally took time to go over Dario's essay on DeepSeek and export control and to be honest it was quite painful to read. And I say this as a great admirer of Anthropic and big user of Claude* The first half of the essay reads like a lengthy attempt to justify that closed-source

thumb_up_off_alt2,2K

chat_bubble_outline111

repeat502

shareShare

Gabriel Synnaeve

@syhw

9 months ago

Show me your checkpoint folder and I'll let you know if I want to work with you.

thumb_up_off_alt16

chat_bubble_outline1

repeat2

shareShare

Gabriel Synnaeve

@syhw

9 months ago

AGI delayed internally.

thumb_up_off_alt130

chat_bubble_outline5

repeat3

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

9 months ago

We release a paper with a beautiful theorectial guarantee that the loss gradient in the RLHF framework is aligned with the underlying true return! It comes with a simple tweak on sampling scheme in practice; can be plug-and-play to any online setting. It's "Online Off-Policy"!

thumb_up_off_alt18

chat_bubble_outline1

repeat1

shareShare

Gabriel Synnaeve

@syhw

8 months ago

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution arxiv.org/abs/2502.18449 by Yuxiang Wei Sida Wang and the whole team! Get started with your favorite model here github.com/facebookresear…

thumb_up_off_alt119

chat_bubble_outline1

repeat29

shareShare

Taco Cohen

@tacocohen

8 months ago

Title cld hv bn shrtr

thumb_up_off_alt164

chat_bubble_outline3

repeat12

shareShare

Pierre Chambon

@pierrechambon6

8 months ago

Does your LLM truly comprehend the complexity of the code it generates? 🥰 Introducing our new non-saturated (for at least the coming week? 😉) benchmark: ✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity? Check out the details below !👇

thumb_up_off_alt119

chat_bubble_outline9

repeat26

shareShare

Pierre Chambon

@pierrechambon6

8 months ago

📸 Quick snapshot of our results ! 🏅 BigO(Bench) evaluates high-level reasoning skills in coding, revealing that top-scoring models on Code Contests often struggle when required to both write and reason about their code. Extra-Space Complexity seems particularly challenging!

thumb_up_off_alt11

chat_bubble_outline1

repeat1

shareShare

Ori Yoran

@oriyoran

8 months ago

New #ICLR2024 paper! The KoLMogorov Test: can CodeLMs compress data by code generation? The optimal compression for a sequence is the shortest program that generates it. Empirically, LMs struggle even on simple sequences, but can be trained to outperform current methods! 🧵1/7

thumb_up_off_alt292

chat_bubble_outline8

repeat47

shareShare

Mark Saroufim

@marksaroufim

7 months ago

x.com/i/article/1904…

thumb_up_off_alt394

chat_bubble_outline9

repeat67

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

7 months ago

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

thumb_up_off_alt5,5K

chat_bubble_outline323

repeat959

shareShare

Gabriel Synnaeve

@syhw

6 months ago

thumb_up_off_alt62

chat_bubble_outline1

repeat1

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

6 months ago

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

thumb_up_off_alt823

chat_bubble_outline12

repeat141

shareShare

Michael Hassid

@michaelhassid

5 months ago

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

thumb_up_off_alt104

chat_bubble_outline5

repeat34

shareShare

Jakob Foerster

@j_foerst

5 months ago

Hello World: My team at FAIR / AI at Meta (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1FAI…

thumb_up_off_alt111

chat_bubble_outline3

repeat23

shareShare

Gabriel Synnaeve

@syhw

5 months ago

Neural concrete interpretation at scale is indistinguishable from neural abstract interpretation.

thumb_up_off_alt3

chat_bubble_outline2

repeat0

shareShare

Gabriel Synnaeve

@syhw

5 months ago

The code *is* the chain-of-thought for programming. So successive iterations over the code *are* the reasoning tokens for programming.

thumb_up_off_alt21

chat_bubble_outline0

repeat2

shareShare

Gabriel Synnaeve

@syhw

5 months ago

The Soup-Salad-Sandwich category theorists are missing Omelette (everything containing eggs) saladtheory.github.io

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Gabriel Synnaeve

@syhw

5 months ago

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Charles Arnal

@arnal_charles

4 months ago

❓How to balance negative and positive rewards in off-policy RL❓ In Asymmetric REINFORCE for off-Policy RL, we show that giving less weight to negative rewards is enough to stabilize off-policy RL training for LLMs! 💪 (1/8) Paper: arxiv.org/abs/2506.20520

thumb_up_off_alt151

chat_bubble_outline2

repeat27

shareShare