Alberto Bietti (@albertobietti) 's Twitter Profile
Alberto Bietti

@albertobietti

Machine learning research. Research scientist @FlatironCCM, previously @MetaAI, @NYUDataScience, @Inria, @Quora.

ID: 11056912

linkhttp://alberto.bietti.me calendar_today11-12-2007 18:03:01

1,1K Tweet

1,1K Followers

1,1K Following

Sainbayar Sukhbaatar (@tesatory) 's Twitter Profile Photo

Ten years ago in 2015 we published a paper called End-to-End Memory Networks (arxiv.org/abs/1503.08895). Looking back, this paper had many of the ingredients of current LLMs. Our model was the first language model that completely replaced RNN with attention. It had dot-product

Ten years ago in 2015 we published a paper called End-to-End Memory Networks (arxiv.org/abs/1503.08895). Looking back, this paper had many of the ingredients of current LLMs. Our model was the first language model that completely replaced RNN with attention. It had dot-product
Lénaïc Chizat (@lenaicchizat) 's Twitter Profile Photo

Announcing : The 2nd International Summer School on Mathematical Aspects of Data Science EPFL, Sept 1–5, 2025 Speakers: Bach (Francis Bach) Bandeira Mallat Montanari (Andrea Montanari) Peyré (Gabriel Peyré) For PhD students & early-career researchers Application deadline: May 15

IMS (@instmathstat) 's Twitter Profile Photo

Exciting news in the global statistics community! Grace Wahba was awarded the prestigious 2025 International Prize in Statistics for her groundbreaking work on smoothing splines, which revolutionized data analysis and machine learning. https://www. statprize.org/index.cfm

Exciting news in the global statistics community! Grace Wahba was awarded the prestigious 2025 International Prize in Statistics for her groundbreaking work on smoothing splines, which revolutionized data analysis and machine learning. https://www. statprize.org/index.cfm
Randall Balestriero (@randall_balestr) 's Twitter Profile Photo

- a century old dream for intelligent machines (Turing et al) - a decades old paradox exposing the impossibility of that dream (Moravec's paradox) - and now a Self Supervised Learning community reaching for the Sun Join our SSL experts to learn what tomorrow will look like!

- a century old dream for intelligent machines (Turing et al)
- a decades old paradox exposing the impossibility of that dream (Moravec's paradox)
- and now a Self Supervised Learning community reaching for the Sun
Join our SSL experts to learn what tomorrow will look like!
Tanya Marwah (@__tm__157) 's Twitter Profile Photo

What is the role of memory for modeling time dependent PDEs? I will be at ICLR presenting our paper (Oral) where we study when it is beneficial for modeling time-dependent PDEs! 🔗openreview.net/forum?id=o9kqa… [Oral]: Thu 24 Apr 10:30 am @ Session 1E [Poster]: Thu 24 Apr 3 pm #617

Eshaan Nichani (@eshaannichani) 's Twitter Profile Photo

How do transformers optimally "store" factual information within their weights? How are these facts learned during GD? We study this question by interpreting transformer weights as associative memories. Drop by our #ICLR2025 Spotlight Poster (Thurs. @ 3pm, #602) to learn more! 🧵

How do transformers optimally "store" factual information within their weights? How are these facts learned during GD? We study this question by interpreting transformer weights as associative memories.
Drop by our #ICLR2025 Spotlight Poster (Thurs. @ 3pm, #602) to learn more!
🧵
Alberto Bietti (@albertobietti) 's Twitter Profile Photo

Come hear Matt Smart's talk about in-context denoising with transformers at the Associative memory workshop #ICLR25, 2:15pm! This task refines the connection between transformers and associative memories. w/ M Smart and Anirvan Sengupta at Flatiron Institute Paper: arxiv.org/abs/2502.05164

Charles Margossian (@charlesm993) 's Twitter Profile Photo

✨Thank you AISTATS Conference for the Best Paper Award!! 📜 arxiv.org/abs/2410.11067 💡 What does VI learn and under what conditions? The answer lies in symmetry. 🤝 Honored to share this award with my colleague Lawrence Saul from Flatiron Institute.

Eshaan Nichani (@eshaannichani) 's Twitter Profile Photo

Excited to announce a new paper with Yunwei Ren, Denny Wu, Jason Lee! We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks. arxiv.org/abs/2504.19983 🧵below (1/10)

Excited to announce a new paper with Yunwei Ren, Denny Wu, <a href="/jasondeanlee/">Jason Lee</a>!

We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks.

arxiv.org/abs/2504.19983

🧵below (1/10)
Randall Balestriero (@randall_balestr) 's Twitter Profile Photo

Recordings are available at: simonsfoundation.org/event/self-sup… Check it out to learn about the latest of SSL research, future research directions, and to witness an incredible optimism and excitement about AI research! Quoting many speakers: "We barely started scratching the surface" 🚀

Zixuan Wang (@zzzixuanwang) 's Twitter Profile Photo

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training?

In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient!

arxiv.org/abs/2505.23683

🧵 below (1/10)
Robert M. Gower 🇺🇦 (@gowerrobert) 's Twitter Profile Photo

Are you interested in the new Muon/Scion/Gluon method for training LLMs? To run Muon, you need to approximate the matrix sign (or polar factor) of the momentum matrix. We've developed an optimal method *The PolarExpress* just for this! If you're interested, climb aboard 1/x

Are you interested in the new Muon/Scion/Gluon method for training LLMs? 
To run Muon, you need to approximate the matrix sign (or polar factor) of the momentum matrix. We've developed an optimal method *The PolarExpress* just for this! If you're interested, climb aboard 1/x
Jason Lee (@jasondeanlee) 's Twitter Profile Photo

New work arxiv.org/abs/2506.05500 on learning multi-index models with Alex Damian and Joan Bruna. Multi-index are of the form y= g(Ux), where U=r by d maps from d dimension to r dimension and d>>r. g is an arbitrary function. Examples of multi-index models are any neural net

Tanya Marwah (@__tm__157) 's Twitter Profile Photo

This is the first step in a direction that I am very excited about! Using LLMs to solve scientific computing problems and potentially discover faster (or new) algorithms. #AI4Science #ML4PDEs We show that LLMs can write PDE solver code, choose appropriate algorithms, and produce

Konstantin Mishchenko (@konstmish) 's Twitter Profile Photo

There are several hypotheses for why Adam outperforms SGD on LLMs: heavy-tailed noise, blowing up curvature, near-constant magnitude of update, etc. The one I find most compelling is label imbalance: Adam specifically improves performance on rare classes, of which there are many.

There are several hypotheses for why Adam outperforms SGD on LLMs: heavy-tailed noise, blowing up curvature, near-constant magnitude of update, etc. The one I find most compelling is label imbalance: Adam specifically improves performance on rare classes, of which there are many.