Matteo Pagliardini (@matpagliardini) 's Twitter Profile
Matteo Pagliardini

@matpagliardini

PhD student in ML @EPFL_en, previously Apple MLR. @matpagliardini.bsky.social

ID: 1276586949835476992

linkhttps://mpagli.github.io/ calendar_today26-06-2020 18:44:11

110 Tweet

868 Followers

445 Following

Arnaud Pannatier (@arnaudpannatier) 's Twitter Profile Photo

GPTs are generating sequences in a left-to-right order. Is there another way? With François Fleuret and @evanncourdier, in partnership with SkySoft-ATM, we developed σ-GPT, capable of generating sequences in any order chosen dynamically at inference time. 1/6

Jason Ramapuram (@jramapuram) 's Twitter Profile Photo

Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length. Paper: arxiv.org/abs/2409.04431 Code: github.com/apple/ml-sigmo… This was

Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length.

Paper: arxiv.org/abs/2409.04431
Code: github.com/apple/ml-sigmo…

This was
Anand Gopalakrishnan (@agopal42) 's Twitter Profile Photo

Excited to present "Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery" at #NeurIPS2024! TL;DR: Our model, SynCx, greatly simplifies the inductive biases and training procedures of current state-of-the-art synchrony models. Thread 👇 1/x.

Excited to present "Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery" at #NeurIPS2024!
TL;DR: Our model, SynCx, greatly simplifies the inductive biases and training procedures of current state-of-the-art synchrony models. Thread 👇 1/x.
EPFL (@epfl_en) 's Twitter Profile Photo

🚀 Together with ETH Zürich and the CSCS, we have just released Apertus, 🇨🇭 Switzerland’s first large-scale, open, multilingual language model — a milestone in generative AI for transparency and diversity. Find out more: go.epfl.ch/a672aa

Alex Hägele (@haeggee) 's Twitter Profile Photo

New work from our MLO lab EPFL: Benchmarking the variety of different proposed LLM optimizers: Muon, AdEMAMix, ... all in the same setting, tuned, with varying model size, batch size, and training duration! Huge sweep of experiments by Andrei Semenov Matteo Pagliardini M Jaggi

New work from our MLO lab <a href="/EPFL_en/">EPFL</a>:
Benchmarking the variety of different proposed LLM optimizers: Muon, AdEMAMix, ... all in the same setting, tuned, with varying model size, batch size, and training duration! Huge sweep of experiments by <a href="/AndreiSemenov17/">Andrei Semenov</a> <a href="/MatPagliardini/">Matteo Pagliardini</a> M Jaggi
Andrei Semenov (@andreisemenov17) 's Twitter Profile Photo

Amazing "competing" work from Kaiyue Wen Tengyu Ma Percy Liang There are some good stories about optimizers to tell this week 😃 arxiv.org/abs/2509.01440 arxiv.org/abs/2509.02046

Amazing "competing" work from <a href="/wen_kaiyue/">Kaiyue Wen</a> <a href="/tengyuma/">Tengyu Ma</a> <a href="/percyliang/">Percy Liang</a>
There are some good stories about optimizers to tell this week 😃

arxiv.org/abs/2509.01440
arxiv.org/abs/2509.02046