Lorenzo Noci (@lorenzo_noci) 's Twitter Profile
Lorenzo Noci

@lorenzo_noci

PhD in Machine Learning at @ETH

working on deep learning theory and principled large-scale AI models.

ID: 2207106105

linkhttps://lorenzonoci.github.io/ calendar_today04-12-2013 08:20:18

67 Tweet

374 Followers

257 Following

Gregor Bachmann (@gregorbachmann1) 's Twitter Profile Photo

From stochastic parrot 🦜 to Clever Hans 🐴? In our work with Vaishnavh Nagarajan @ ICML we carefully analyse the debate surrounding next-token prediction and identify a new failure of LLMs due to teacher-forcing 👨🏻‍🎓! Check out our work arxiv.org/abs/2403.06963 and the linked thread!

From stochastic parrot 🦜 to Clever Hans 🐴? In our work with <a href="/_vaishnavh/">Vaishnavh Nagarajan @ ICML</a> we carefully analyse the debate surrounding next-token prediction and identify a new failure of LLMs due to teacher-forcing 👨🏻‍🎓! Check out our work arxiv.org/abs/2403.06963 and the linked thread!
Alex Atanasov (@abatanasov) 's Twitter Profile Photo

[1/n] Thrilled that this project with @jzavatoneveth and @cpehlevan is finally out! Our group has spent a lot of time studying high dimensional regression and its connections to scaling laws. All our results follow easily from a single central theorem 🧵 arxiv.org/abs/2405.00592

Bobby (@bobby_he) 's Twitter Profile Photo

Outlier Features (OFs) aka “neurons with big features” emerge in standard transformer training & prevent benefits of quantisation🥲but why do OFs appear & which design choices minimise them? Our new work (+Lorenzo Noci Daniele Paliotta Imanol Schlag T. Hofmann) takes a look👀🧵

Outlier Features (OFs) aka “neurons with big features” emerge in standard transformer training &amp; prevent benefits of quantisation🥲but why do OFs appear &amp; which design choices minimise them?

Our new work (+<a href="/lorenzo_noci/">Lorenzo Noci</a> <a href="/DanielePaliotta/">Daniele Paliotta</a> <a href="/ImanolSchlag/">Imanol Schlag</a> T. Hofmann) takes a look👀🧵
Aurelien Lucchi (@aurelienlucchi) 's Twitter Profile Photo

My group has multiple openings both for PhD and Post-doc positions to work in the area of optimization for ML, and deep learning theory. We are looking for people with a strong theoretical background (degree in math, theoretical physics or CS with strong theory emphasis).

Chris J. Maddison (@cjmaddison) 's Twitter Profile Photo

I'm also recruiting PhD/MSc students this coming cycle, with an eye towards applications in drug discovery. cs.toronto.edu/~cmaddis/ DM me or email me if you have any questions at all!

Bobby (@bobby_he) 's Twitter Profile Photo

Updated camera ready arxiv.org/abs/2405.19279. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!

Updated camera ready arxiv.org/abs/2405.19279. New results include:

- non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor)
- Scaling to 7B params
 - showing our methods to reduce OFs translate to PTQ int8 quantisation ease.

Check it out!
Lorenzo Noci (@lorenzo_noci) 's Twitter Profile Photo

Systematic empirical analysis of the role of feature learning in continual learning using scaling limits theory. Meet Jacopo in Vancouver :)

Bobby (@bobby_he) 's Twitter Profile Photo

Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!

Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday  to chat about why outlier features emerge during training and how we can prevent them!
Blake Bordelon ☕️🧪👨‍💻 (@blake__bordelon) 's Twitter Profile Photo

Come by at Neurips to hear Hamza present about interesting properties of various feature learning infinite parameter limits of transformer models! Poster in Hall A-C #4804 at 11 AM PST Friday Paper arxiv.org/abs/2405.15712 Work with Hamza Tahir Chaudhry and Cengiz Pehlevan

Come by at Neurips to hear Hamza present about interesting properties of various feature learning infinite parameter limits of transformer models!

Poster in Hall A-C #4804 at 11 AM PST Friday

Paper arxiv.org/abs/2405.15712 

Work with <a href="/hamzatchaudhry/">Hamza Tahir Chaudhry</a> and <a href="/CPehlevan/">Cengiz Pehlevan</a>
Lénaïc Chizat (@lenaicchizat) 's Twitter Profile Photo

Announcing : The 2nd International Summer School on Mathematical Aspects of Data Science EPFL, Sept 1–5, 2025 Speakers: Bach (Francis Bach) Bandeira Mallat Montanari (Andrea Montanari) Peyré (Gabriel Peyré) For PhD students & early-career researchers Application deadline: May 15

Alberto Bietti (@albertobietti) 's Twitter Profile Photo

Come hear about how transformers perform factual recall using associative memories, and how this emerges in phases during training! #ICLR2025 poster #602 at 3pm today. Lead by Eshaan Nichani Link: iclr.cc/virtual/2025/p… Paper: arxiv.org/abs/2412.06538

Aurelien Lucchi (@aurelienlucchi) 's Twitter Profile Photo

Our research group in the department of Mathematics and CS at the University of Basel (Switzerland) is looking for several PhD candidates and one post-doc who have a theoretical background in optimization and machine learning or practical experience in reasoning. RT please.

Lorenzo Noci (@lorenzo_noci) 's Twitter Profile Photo

Pass by if you want to know about scaling up your model under distribution shifts of the training data. Take away: muP needs to be tuned to the optimal amount of feature learning that optimizes the forgetting/plasticity trade off.