Christos Perivolaropoulos (@ccperivol) Twitter Tweets • TwiCopy

Petar Veličković

a year ago

"Energy continuously flows from being concentrated, to becoming dispersed, spread out, wasted and useless." ⚡➡️🌬️ Sharing our work on the inability of softmax in Transformers to _robustly_ learn sharp functions out-of-distribution. Together w/ Christos Perivolaropoulos Federico Barbero & Razvan!

thumb_up_off_alt475

chat_bubble_outline11

repeat80

shareShare

Petar Veličković

@petarv_93

a year ago

Round and Round we Go! 🔄 Rotary Positional Encodings (RoPE) are a common staple of frontier LLMs. _Why_ do they work so well, and _how_ do LLMs make advantage of them? The results might surprise you, as they challenge commonly-held wisdom! Read on ↩️ Work led by Federico Barbero!

thumb_up_off_alt619

chat_bubble_outline9

repeat109

shareShare

Petar Veličković

@petarv_93

a year ago

We hope our work contributes to improved understanding of rotary PEs and how they're used, while paving the way to exciting positional embedding schemes in the future! Our work is available on the arXiv: arxiv.org/abs/2410.06205 Federico Barbero Christos Perivolaropoulos et al -- it's been a pleasure!

thumb_up_off_alt36

chat_bubble_outline0

repeat1

shareShare

Petar Veličković

@petarv_93

a year ago

The cards will be revealed soon enough 👀

thumb_up_off_alt16

chat_bubble_outline2

repeat1

shareShare

Petar Veličković

@petarv_93

a year ago

I love a good leaderboard... or several! ⏫ TGR 🐅 is our graph rewiring method for temporal graphs leveraging expander graph propagation. Turns out, TGR is _real good_ 🔥 setting SOTA on _four_ diverse tasks in the TGB dataset. Read on for more 🧵 Katarina Petrovic Shenyang Huang

thumb_up_off_alt41

chat_bubble_outline1

repeat5

shareShare

Alexander Doria

@dorialexander

a year ago

Releasing my detailed commented introduction to LLM sampling colab.research.google.com/drive/18-2Z4TM… We get back to the basics and slowly build up to a reproduction of the adaptive temperature strategy from "Softmax is not enough" (from Petar Veličković et al.)

thumb_up_off_alt533

chat_bubble_outline8

repeat79

shareShare

Petar Veličković

@petarv_93

a year ago

CGP has now been accepted to Learning on Graphs Conference 2025 😊🎉 Heartfelt congratulations to JJ Wilson on getting his first lead-author paper out of the door -- you've come such a long way from last year, in spite of having no formal AI training! I am impressed. 🙇‍♂️ Onwards! 🚀

thumb_up_off_alt63

chat_bubble_outline0

repeat9

shareShare

Petar Veličković

@petarv_93

a year ago

Take time to study carefully how data flows in a Transformer, and you'll near-certainly find something useful that others missed 🧑‍🔬 Also, don't hold back on sharing a result just 'cause it feels obvious. 'softmax is not enough' was needlessly delayed for 10 months due to this 😶

thumb_up_off_alt251

chat_bubble_outline3

repeat26

shareShare

Christos Perivolaropoulos

@ccperivol

a year ago

Serious question to torch users who are to some degree aware of JAX and keras3: why do you prefer torch?

thumb_up_off_alt3

chat_bubble_outline2

repeat0

shareShare

Petar Veličković

@petarv_93

a year ago

This is the poster I'm most happy about in my career, even though the actual amount of writing effort was minimal 🙃 Coming soon to NeurIPS Workshops near you (two spotlights)! 🔦 I sadly won't be there myself, but Christos Perivolaropoulos & Federico Barbero will be happy to tell you all about it 🚀

thumb_up_off_alt476

chat_bubble_outline7

repeat74

shareShare

Petar Veličković

@petarv_93

a year ago

Our team is hiring Student Researchers Google DeepMind for '25! 🧑‍🔬 Interested in understanding reasoning capabilities from first principles? 🧑‍🎓 Currently studying for a BS/MS/PhD? 🧑‍💻 Have solid engineering and research skills? 🌟 We want to hear from you! Details in thread.

thumb_up_off_alt918

chat_bubble_outline15

repeat88

shareShare

Federico Barbero

@fedzbar

a year ago

Heading tomorrow to Vancouver for NeurIPS! Please do reach out if you want to chat about reasoning in Transformers / LLMs :) I'll be presenting our work "Transformers need glasses! 👓" on Thursday at 4:30pm at East Exhibit Hall A-C #1806.

thumb_up_off_alt177

chat_bubble_outline3

repeat26

shareShare

Petar Veličković

@petarv_93

a year ago

As I've been asked a few times recently: I won't be going to NeurIPS (travelled way too much! 😅) But if you're going, I warmly invite you to stop by Federico Barbero's poster this Thursday! Federico worked on this as part of his Student Researcher placement with us Google DeepMind

thumb_up_off_alt50

chat_bubble_outline0

repeat6

shareShare

Petar Veličković

@petarv_93

a year ago

Thank you to the Scientific Methods for Understanding Deep Learning Workshop at #NeurIPS2024 for featuring our paper as one of the best paper runner-ups in its 'Debunking challenge'!!! 🚀🧑‍🔬 Also, sincere thanks to Christos Perivolaropoulos and Federico Barbero for their tireless work presenting our little softmax study throughout the day! 🙌

Thank you to the <a href="/scifordl/">Scientific Methods for Understanding Deep Learning</a> Workshop at #NeurIPS2024 for featuring our paper as one of the best paper runner-ups in its 'Debunking challenge'!!! 🚀🧑‍🔬

Also, sincere thanks to <a href="/ccperivol/">Christos Perivolaropoulos</a> and <a href="/fedzbar/">Federico Barbero</a> for their tireless work presenting our little softmax study throughout the day! 🙌

thumb_up_off_alt37

chat_bubble_outline0

repeat4

shareShare

Christos Perivolaropoulos

@ccperivol

a year ago

Kudos to everyone who worked on this. It looks amazing!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jacob Austin

@jacobaustin132

a year ago

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat377

shareShare

Jeff Dean

@jeffdean

10 months ago

I love the game of Boggle. This demo showcases our Gemini 2.0 Pro model’s coding abilities in AI Studio. It is mind boggling to think that it can write the full piece of code, including all the right data structures and search algorithms to find all valid words on a Boggle

thumb_up_off_alt645

chat_bubble_outline79

repeat75

shareShare

Federico Barbero

@fedzbar

8 months ago

Super excited to be heading to Singapore tomorrow to present our work on RoPE with Alex, Christos Perivolaropoulos, Razvan, Petar Veličković. Christos and I will be presenting on Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT Hall 3 + Hall 2B #242. Happy to meet and catch up :) DMs are open!

Super excited to be heading to Singapore tomorrow to present our work on RoPE with Alex, <a href="/ccperivol/">Christos Perivolaropoulos</a>, Razvan, <a href="/PetarV_93/">Petar Veličković</a>.

Christos and I will be presenting on Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT Hall 3 + Hall 2B #242.

Happy to meet and catch up :) DMs are open!

thumb_up_off_alt202

chat_bubble_outline2

repeat17

shareShare

PVLDB

@pvldb

7 months ago

Vol:18 No:5 → Dandelion: Smaller Clusters, Bigger Speeds—Distributed Transactions Redefined vldb.org/pvldb/vol18/p1…

thumb_up_off_alt24

chat_bubble_outline0

repeat4

shareShare

Petar Veličković

@petarv_93

7 months ago

AlphaEvolve is here! 🧬 this is one special system (especially when optimising things with jagged edges 😊) -- had a fantastic time using it! congrats Alexander Novikov Matej Balog Ngân Vũ (NV) and team!! 🚀 you can register your interest in using it through the link below:

thumb_up_off_alt74

chat_bubble_outline1

repeat10

shareShare