Fabian Gloeckle (@fabiangloeckle) 's Twitter Profile
Fabian Gloeckle

@fabiangloeckle

PhD student at @AIatMeta and @EcoledesPonts with @syhw and @Amaury_Hayat, co-supervised by @wtgowers. Machine learning for mathematics and programming.

ID: 1694458386635534336

calendar_today23-08-2023 21:15:42

52 Tweet

519 Followers

220 Following

Tom Sander @NeurIPS (@rednastom) 's Twitter Profile Photo

You didn’t believe in Differential Private training for foundation models? We achieved the same performance as non-private MAE trained on the same dataset, but with rigorous DP. Code is released: github.com/facebookresear…. Presenting tomorrow at ICML, 11:30AM poster, #2313

Kaiyu Yang (@kaiyuyang4) 's Twitter Profile Photo

We're looking for a postdoc at Meta FAIR to work on AI4Math, e.g., neural theorem proving, autoformalization, learning mathematical rules and abstractions, and automated discovery/conjecturing in math. Please apply at metacareers.com/jobs/145969190… and email me. Please help share this

Jason Rute @ JMM 2025 (@jasonrute) 's Twitter Profile Photo

Prof. Anima Anandkumar I really worry that the, shall we say, “questionable” claims in this paper (162 unproven by human theorems) will get taken seriously and the rest of us working in this field will look really bad for it. There are much better works already in this field.

TimDarcet (@timdarcet) 's Twitter Profile Photo

🚨 RELEASE ALERT ‼️ github.com/facebookresear… THIS CHANGES EVERYTHING $META just dropped a game-changing codebase! Now everyone can do LLM research! 😱 🧵10 best things people are already building with lingua 🔥👇

Gabriel Synnaeve (@syhw) 's Twitter Profile Photo

Want to do research in code generation with LLMs and wonky deep learning from the 90s? We're recruiting one Master student (M2) intern for 2025 at FAIR Paris in my team metacareers.com/jobs/106871446…

Mathurin Videau (@mathuvu_) 's Twitter Profile Photo

Meta Lingua: a minimal, fast LLM codebase for training and inference. By researchers, for researchers. Easily hackable, still reproducible. Built-in efficiency, profiling (cpu, gpu and mem) and interpretability (automatic activation and gradient statistics) Joint work w/ Badr Youbi Idrissi

Taco Cohen (@tacocohen) 's Twitter Profile Photo

🚨 New intern position in the FAIR CodeGen Team! 🚨 I'm particularly interested in working with candidates with expertise in off-policy RL methods, and/or in code generation with LLMs, but this is not a hard requirement and the project topic is somewhat flexible.

🚨 New intern position in the FAIR CodeGen Team! 🚨

I'm particularly interested in working with candidates with expertise in off-policy RL methods, and/or in code generation with LLMs, but this is not a hard requirement and the project topic is somewhat flexible.
Ekin Akyürek (@akyurekekin) 's Twitter Profile Photo

Why do we treat train and test times so differently? Why is one “training” and the other “in-context learning”? Just take a few gradients during test-time — a simple way to increase test time compute — and get a SoTA in ARC public validation set 61%=avg. human score! ARC Prize

Why do we treat train and test times so differently?

Why is one “training” and the other “in-context learning”?

Just take a few gradients during test-time — a simple way to increase test time compute — and  get a SoTA in ARC public validation set 61%=avg. human score! <a href="/arcprize/">ARC Prize</a>
Lean (@leanprover) 's Twitter Profile Photo

We have just launched the new Lean reference manual, our core documentation intended as a comprehensive, precise description of Lean! #leanlang #leanprover Check out the manual: lean-lang.org/doc/reference/… Read more about the release: lean-lang.org/blog/2024-12-1…

Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨

That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing.

You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time.

🧵 How?
Mathurin Videau (@mathuvu_) 's Twitter Profile Photo

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning. Joint work with Badr Youbi Idrissi 1/8

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning.
Joint work with <a href="/byoubii/">Badr Youbi Idrissi</a> 1/8
Belen Alastruey (@b_alastruey) 's Twitter Profile Photo

🚀New paper alert! 🚀 In our work AI at Meta we dive into the struggles of mixing languages in largely multilingual Transformer encoders and use the analysis as a tool to better design multilingual models to obtain optimal performance. 📄: arxiv.org/abs/2508.02256 🧵(1/n)

🚀New paper alert! 🚀

In our work <a href="/AIatMeta/">AI at Meta</a> we dive into the struggles of mixing languages in largely multilingual Transformer encoders and use the analysis as a tool to better design multilingual models to obtain optimal performance.

📄: arxiv.org/abs/2508.02256

🧵(1/n)