Alberto Bietti (@albertobietti) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

NAVER LABS Europe

@naverlabseurope

3 months ago

Applications are now 📣OPEN📣 for #PAISS2025 4th edition of THE AI summer school #Grenoble 1-5 Sept! Speakers so far Yann LeCun Jerome Revaud Dima Damen @CVPR 2025 Mathilde Caron Arthur Gretton Gabriel Peyré Lê Nguyên Hoang bsky.app/profile/science4all.org Alex Cristia Justin Carpentier Mariia Vladimirova paiss.inria.fr

thumb_up_off_alt66

chat_bubble_outline1

repeat21

shareShare

Ten years ago in 2015 we published a paper called End-to-End Memory Networks (arxiv.org/abs/1503.08895). Looking back, this paper had many of the ingredients of current LLMs. Our model was the first language model that completely replaced RNN with attention. It had dot-product

thumb_up_off_alt595

chat_bubble_outline15

repeat116

shareShare

Lénaïc Chizat

@lenaicchizat

3 months ago

Announcing : The 2nd International Summer School on Mathematical Aspects of Data Science EPFL, Sept 1–5, 2025 Speakers: Bach (Francis Bach) Bandeira Mallat Montanari (Andrea Montanari) Peyré (Gabriel Peyré) For PhD students & early-career researchers Application deadline: May 15

thumb_up_off_alt101

chat_bubble_outline2

repeat23

shareShare

IMS

@instmathstat

3 months ago

Exciting news in the global statistics community! Grace Wahba was awarded the prestigious 2025 International Prize in Statistics for her groundbreaking work on smoothing splines, which revolutionized data analysis and machine learning. https://www. statprize.org/index.cfm

thumb_up_off_alt98

chat_bubble_outline1

repeat36

shareShare

Randall Balestriero

@randall_balestr

3 months ago

- a century old dream for intelligent machines (Turing et al) - a decades old paradox exposing the impossibility of that dream (Moravec's paradox) - and now a Self Supervised Learning community reaching for the Sun Join our SSL experts to learn what tomorrow will look like!

thumb_up_off_alt102

chat_bubble_outline11

repeat18

shareShare

Tanya Marwah

@__tm__157

3 months ago

What is the role of memory for modeling time dependent PDEs? I will be at ICLR presenting our paper (Oral) where we study when it is beneficial for modeling time-dependent PDEs! 🔗openreview.net/forum?id=o9kqa… [Oral]: Thu 24 Apr 10:30 am @ Session 1E [Poster]: Thu 24 Apr 3 pm #617

thumb_up_off_alt84

chat_bubble_outline1

repeat23

shareShare

Eshaan Nichani

@eshaannichani

3 months ago

How do transformers optimally "store" factual information within their weights? How are these facts learned during GD? We study this question by interpreting transformer weights as associative memories. Drop by our #ICLR2025 Spotlight Poster (Thurs. @ 3pm, #602) to learn more! 🧵

thumb_up_off_alt26

chat_bubble_outline3

repeat10

shareShare

Alberto Bietti

@albertobietti

3 months ago

Come hear Matt Smart's talk about in-context denoising with transformers at the Associative memory workshop #ICLR25, 2:15pm! This task refines the connection between transformers and associative memories. w/ M Smart and Anirvan Sengupta at Flatiron Institute Paper: arxiv.org/abs/2502.05164

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

Charles Margossian

@charlesm993

2 months ago

✨Thank you AISTATS Conference for the Best Paper Award!! 📜 arxiv.org/abs/2410.11067 💡 What does VI learn and under what conditions? The answer lies in symmetry. 🤝 Honored to share this award with my colleague Lawrence Saul from Flatiron Institute.

thumb_up_off_alt26

chat_bubble_outline3

repeat4

shareShare

Eshaan Nichani

@eshaannichani

2 months ago

Excited to announce a new paper with Yunwei Ren, Denny Wu, Jason Lee! We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks. arxiv.org/abs/2504.19983 🧵below (1/10)

Excited to announce a new paper with Yunwei Ren, Denny Wu, <a href="/jasondeanlee/">Jason Lee</a>!

We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks.

arxiv.org/abs/2504.19983

🧵below (1/10)

thumb_up_off_alt199

chat_bubble_outline5

repeat45

shareShare

Randall Balestriero

@randall_balestr

2 months ago

Recordings are available at: simonsfoundation.org/event/self-sup… Check it out to learn about the latest of SSL research, future research directions, and to witness an incredible optimism and excitement about AI research! Quoting many speakers: "We barely started scratching the surface" 🚀

thumb_up_off_alt63

chat_bubble_outline1

repeat15

shareShare

Zixuan Wang

@zzzixuanwang

2 months ago

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

thumb_up_off_alt186

chat_bubble_outline1

repeat34

shareShare

Robert M. Gower 🇺🇦

@gowerrobert

a month ago

Are you interested in the new Muon/Scion/Gluon method for training LLMs? To run Muon, you need to approximate the matrix sign (or polar factor) of the momentum matrix. We've developed an optimal method *The PolarExpress* just for this! If you're interested, climb aboard 1/x

thumb_up_off_alt190

chat_bubble_outline2

repeat23

shareShare

Jason Lee

@jasondeanlee

a month ago

New work arxiv.org/abs/2506.05500 on learning multi-index models with Alex Damian and Joan Bruna. Multi-index are of the form y= g(Ux), where U=r by d maps from d dimension to r dimension and d>>r. g is an arbitrary function. Examples of multi-index models are any neural net

thumb_up_off_alt112

chat_bubble_outline2

repeat19

shareShare

Tanya Marwah

@__tm__157

a month ago

This is the first step in a direction that I am very excited about! Using LLMs to solve scientific computing problems and potentially discover faster (or new) algorithms. #AI4Science #ML4PDEs We show that LLMs can write PDE solver code, choose appropriate algorithms, and produce

thumb_up_off_alt33

chat_bubble_outline0

repeat10

shareShare

Konstantin Mishchenko

@konstmish

a month ago

There are several hypotheses for why Adam outperforms SGD on LLMs: heavy-tailed noise, blowing up curvature, near-constant magnitude of update, etc. The one I find most compelling is label imbalance: Adam specifically improves performance on rare classes, of which there are many.

thumb_up_off_alt303

chat_bubble_outline12

repeat37

shareShare

Alberto Bietti

Gate.io

NAVER LABS Europe

Sainbayar Sukhbaatar

Lénaïc Chizat

IMS

Randall Balestriero

Tanya Marwah

Eshaan Nichani

Alberto Bietti

Charles Margossian

Eshaan Nichani

Randall Balestriero

Zixuan Wang

Robert M. Gower 🇺🇦

Jason Lee

Tanya Marwah

Konstantin Mishchenko