Antonio Orvieto (@orvieto_antonio) Twitter Tweets • TwiCopy

Antonio Orvieto

@orvieto_antonio

+ Follow

Deep Learning PI @ELLISInst_Tue, Group Leader @MPI_IS.
I compute stuff with lots of gradients 🧮,
I like Kierkegaard & Lévi-Strauss 🧙‍♂️

ID: 1172891108076077057

linkhttp://orvi.altervista.org/ calendar_today14-09-2019 15:13:38

303 Tweet

1,1K Followers

1,1K Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

It's very difficult to improve the *exponent* in scaling laws for loss vs compute, especially by changing the optimizer! Our new paper shows that scaling momentum correctly can *provably* improve the scaling exponent on a theoretical model. Empirically, it works on LSTMs too!

thumb_up_off_alt304

chat_bubble_outline11

repeat58

shareShare

Antonio Orvieto

@orvieto_antonio

2 months ago

join us tonight to talk about Adam! maybe we will touch a bit on Muon & friends -- they carry many of the open questions we have about Adam ❤️ thanks Yannic

thumb_up_off_alt27

chat_bubble_outline0

repeat1

shareShare

Antonio Orvieto

@orvieto_antonio

2 months ago

😻

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Antonio Orvieto

@orvieto_antonio

2 months ago

Best reference for understanding and improving new optimizers!

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Antonio Orvieto

@orvieto_antonio

2 months ago

Thanks Robert for this amazing summary!!! ❤️

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Jonas Geiping

@jonasgeiping

2 months ago

Forecasting future events is a fascinating task for language models. Arguably the hardest application for a pure "oracle" that can't take actions; requiring reasoning about conflicting info, planning, information seeking... But, forecasting is also uniquely hard to evaluate:

thumb_up_off_alt37

chat_bubble_outline1

repeat9

shareShare

Antonio Orvieto

@orvieto_antonio

2 months ago

If you are ever ablating on LM pretraining, this is the ONLY codebase I trust, by the amazing Nico.

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Antonio Orvieto

@orvieto_antonio

a month ago

It is essential to thoroughly evaluate, test, and compare ideas. This unbiased process is rare in modern research. Niccolò did this for averaging checkpoints: with a large number of experiments, demonstrating when and where to average weights in modern large-scale setups. Super

thumb_up_off_alt16

chat_bubble_outline1

repeat2

shareShare

Robert Lange

@roberttlange

a month ago

Text-to-LoRA: What if you no longer had to fine-tune your LLM for every single downstream task? 🚀 Stoked to share our work on instant LLM adaptation using meta-learned hypernetworks 📝 → 🔥 The idea is simple yet elegant: We text-condition a hypernetwork to output LoRA

thumb_up_off_alt384

chat_bubble_outline7

repeat62

shareShare

Simone Scardapane

@s_scardapane

a month ago

*Generalized Interpolating Discrete Diffusion* by Dimitri von Rütte Antonio Orvieto & al. A class of discrete diffusion models combining standard masking with uniform noise to allow the model to potentially "correct" previously wrong tokens. arxiv.org/abs/2503.04482

*Generalized Interpolating Discrete Diffusion*
by <a href="/dvruette/">Dimitri von Rütte</a> <a href="/orvieto_antonio/">Antonio Orvieto</a> & al.

A class of discrete diffusion models combining standard masking with uniform noise to allow the model to potentially "correct" previously wrong tokens.

arxiv.org/abs/2503.04482

thumb_up_off_alt142

chat_bubble_outline1

repeat20

shareShare

Antonio Orvieto

@orvieto_antonio

a month ago

cool work on linear rnn theory!!

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Antonio Orvieto

Gate.io

Damien Ferbach

Antonio Orvieto

Antonio Orvieto

Antonio Orvieto

Antonio Orvieto

Jonas Geiping

Antonio Orvieto

Antonio Orvieto

Robert Lange

Simone Scardapane

Antonio Orvieto