Matteo Pagliardini (@matpagliardini) Twitter Tweets • TwiCopy

Matteo Pagliardini

@matpagliardini

+ Follow

PhD student in ML @EPFL_en, previously Apple MLR. @matpagliardini.bsky.social

ID: 1276586949835476992

linkhttps://mpagli.github.io/ calendar_today26-06-2020 18:44:11

110 Tweet

868 Followers

445 Following

Arnaud Pannatier

@arnaudpannatier

a year ago

GPTs are generating sequences in a left-to-right order. Is there another way? With François Fleuret and @evanncourdier, in partnership with SkySoft-ATM, we developed σ-GPT, capable of generating sequences in any order chosen dynamically at inference time. 1/6

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat241

shareShare

Jason Ramapuram

@jramapuram

a year ago

Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length. Paper: arxiv.org/abs/2409.04431 Code: github.com/apple/ml-sigmo… This was

thumb_up_off_alt836

chat_bubble_outline16

repeat165

shareShare

Anand Gopalakrishnan

@agopal42

a year ago

Excited to present "Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery" at #NeurIPS2024! TL;DR: Our model, SynCx, greatly simplifies the inductive biases and training procedures of current state-of-the-art synchrony models. Thread 👇 1/x.

thumb_up_off_alt164

chat_bubble_outline2

repeat41

shareShare

Daniele Paliotta

@danielepaliotta

9 months ago

📢 New Paper: Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners. w/ Junxiong Wang, Matteo Pagliardini, Kevin Li , Aviv Bick, Zico Kolter, Albert Gu, François Fleuret , Tri Dao. Paper: arxiv.org/abs/2502.20339 pic.x.com/NyIzBefWcR 1/N

thumb_up_off_alt248

chat_bubble_outline5

repeat37

shareShare

EPFL

@epfl_en

3 months ago

🚀 Together with ETH Zürich and the CSCS, we have just released Apertus, 🇨🇭 Switzerland’s first large-scale, open, multilingual language model — a milestone in generative AI for transparency and diversity. Find out more: go.epfl.ch/a672aa

thumb_up_off_alt106

chat_bubble_outline6

repeat34

shareShare

Antoine Bosselut

@abosselut

3 months ago

The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we (EPFL ETH Zurich) built Apertus.

The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we (<a href="/EPFL/">EPFL</a> <a href="/ETH_en/">ETH Zurich</a>) built Apertus.

thumb_up_off_alt105

chat_bubble_outline2

repeat21

shareShare

Alex Hägele

@haeggee

3 months ago

New work from our MLO lab EPFL: Benchmarking the variety of different proposed LLM optimizers: Muon, AdEMAMix, ... all in the same setting, tuned, with varying model size, batch size, and training duration! Huge sweep of experiments by Andrei Semenov Matteo Pagliardini M Jaggi

New work from our MLO lab <a href="/EPFL_en/">EPFL</a>:
Benchmarking the variety of different proposed LLM optimizers: Muon, AdEMAMix, ... all in the same setting, tuned, with varying model size, batch size, and training duration! Huge sweep of experiments by <a href="/AndreiSemenov17/">Andrei Semenov</a> <a href="/MatPagliardini/">Matteo Pagliardini</a> M Jaggi

thumb_up_off_alt331

chat_bubble_outline5

repeat65

shareShare

Andrei Semenov

@andreisemenov17

3 months ago

Amazing "competing" work from Kaiyue Wen Tengyu Ma Percy Liang There are some good stories about optimizers to tell this week 😃 arxiv.org/abs/2509.01440 arxiv.org/abs/2509.02046

Amazing "competing" work from <a href="/wen_kaiyue/">Kaiyue Wen</a> <a href="/tengyuma/">Tengyu Ma</a> <a href="/percyliang/">Percy Liang</a>
There are some good stories about optimizers to tell this week 😃

arxiv.org/abs/2509.01440
arxiv.org/abs/2509.02046

thumb_up_off_alt211

chat_bubble_outline4

repeat32

shareShare