Federico Barbero (@fedzbar) Twitter Tweets • TwiCopy

Federico Barbero

a year ago

Heading tomorrow to Vancouver for NeurIPS! Please do reach out if you want to chat about reasoning in Transformers / LLMs :) I'll be presenting our work "Transformers need glasses! 👓" on Thursday at 4:30pm at East Exhibit Hall A-C #1806.

thumb_up_off_alt177

chat_bubble_outline3

repeat26

shareShare

Ben Finkelshtein

@benfinkelshtein

a year ago

Come check out Learning on Large Graphs using Intersecting Communities! (With a “hint” of Game of Throne references) @NeurIPS 📌East Exhibit Hall A-C #3001, Session 3

thumb_up_off_alt76

chat_bubble_outline3

repeat6

shareShare

Scientific Methods for Understanding Deep Learning

@scifordl

a year ago

We are back with a contributed talk from Christos Perivolaropoulos about some intriguing properties of sotfmax in Transformer architectures!

We are back with a contributed talk from <a href="/ccperivol/">Christos Perivolaropoulos</a> about some intriguing properties of sotfmax in Transformer architectures!

thumb_up_off_alt28

chat_bubble_outline0

repeat5

shareShare

Ethan

@torchcompiled

10 months ago

I owe you an apology RoPE, I was not familiar with your game.

thumb_up_off_alt941

chat_bubble_outline10

repeat58

shareShare

Ethan

@torchcompiled

10 months ago

so yeah, this is something I've always been confused about with softmax. your denominator keeps growing with sequence length, but logits of individual items are invariant to this. So attention sharpness ultimately depends on sequence length, becoming easier for noise to drown

thumb_up_off_alt713

chat_bubble_outline10

repeat73

shareShare

Petar Veličković

@petarv_93

9 months ago

This just in -- Looks like you'll be seeing more of p-RoPE at #ICLR2025! 🔄 Congratulation Federico Barbero on yet another epic paper from your internship getting published! 🎉

thumb_up_off_alt111

chat_bubble_outline2

repeat13

shareShare

EEML

@eemlcommunity

9 months ago

Applications are now open for EEML 2025 in Sarajevo, Bosnia and Herzegovina, 21-26 July! 🎉 Learn from top AI researchers and connect with peers in Sarajevo 🇧🇦, a historical crossroads of East and West. Needs-based scholarships are available. Deadline: 31 March 2025.

thumb_up_off_alt56

chat_bubble_outline4

repeat12

shareShare

Federico Barbero

@fedzbar

9 months ago

We have AGI guys!!!!1!1!!

thumb_up_off_alt25

chat_bubble_outline2

repeat1

shareShare

Simone Scardapane

@s_scardapane

9 months ago

*Round and Round We Go! What makes Rotary Positional Encodings useful?* by Federico Barbero Petar Veličković Christos Perivolaropoulos They show RoPE has distinct behavior for different rotation angles - high freq for position, low freq for semantics. arxiv.org/abs/2410.06205

*Round and Round We Go! What makes Rotary Positional Encodings useful?*
by <a href="/fedzbar/">Federico Barbero</a> <a href="/PetarV_93/">Petar Veličković</a> <a href="/ccperivol/">Christos Perivolaropoulos</a>

They show RoPE has distinct behavior for different rotation angles - high freq for position, low freq for semantics.

arxiv.org/abs/2410.06205

thumb_up_off_alt205

chat_bubble_outline1

repeat33

shareShare

Alvaro Arroyo

@arroyo_alvr

8 months ago

Vanishing gradients are central to RNNs and SSMs, but how do they affect GNNs? We explore this in our new paper! w/ A. Gravina, Ben Gutteridge Federico Barbero C. Gallicchio xiaowen dong Michael Bronstein Pierre Vandergheynst 🔗 arxiv.org/abs/2502.10818 🧵(1/11)

thumb_up_off_alt142

chat_bubble_outline3

repeat25

shareShare

Frank Noe

@franknoeberlin

8 months ago

The BioEmu-1 model and inference code are now public under MIT license!!! Please go ahead, play with it and let us know if there are issues. github.com/microsoft/bioe…

thumb_up_off_alt351

chat_bubble_outline5

repeat95

shareShare

Colin Fraser

@colin_fraser

8 months ago

Here’s the problem with thinking that just giving it a calculator solves everything.

thumb_up_off_alt940

chat_bubble_outline39

repeat35

shareShare

charliebtan

@charliebtan

8 months ago

New preprint! 🚨 We scale equilibrium sampling to hexapeptide (in cartesian coordinates!) with Sequential Boltzmann generators! 📈 🤯 Work with Joey Bose, Chen Lin, Leon Klein, Michael Bronstein and Alex Tong Thread 🧵 1/11

New preprint! 🚨 We scale equilibrium sampling to hexapeptide (in cartesian coordinates!) with Sequential Boltzmann generators! 📈 🤯

Work with <a href="/bose_joey/">Joey Bose</a>, <a href="/WillLin1028/">Chen Lin</a>, <a href="/leonklein26/">Leon Klein</a>, <a href="/mmbronstein/">Michael Bronstein</a> and <a href="/AlexanderTong7/">Alex Tong</a>

Thread 🧵 1/11

thumb_up_off_alt73

chat_bubble_outline3

repeat21

shareShare

Itay Yona

@itay__yona

8 months ago

Ever felt like you're talking to a parrot with a glitch? 🦜 Turns out, LLMs struggle with repetition in a fascinating way! 🕵️‍♂️ We reverse-engineered the circuit responsible for that bug 🤯

thumb_up_off_alt24

chat_bubble_outline1

repeat4

shareShare

Federico Barbero

@fedzbar

7 months ago

I was left so impressed by the amount of effort and care Tim Scarfe puts into the production of his videos. Definitely recommend his channel, a true privilege to have been interviewed. Please excuse me as I was very jet lagged so be nice!! :)

thumb_up_off_alt17

chat_bubble_outline1

repeat1

shareShare

Petar Veličković

@petarv_93

7 months ago

Indeed it is! Let's look at these techniques together 🌟 Join me at the virtual GLOW seminar today (5pm CET) for the first public showing of my 'LLMs as GNNs' talk. 💬🕸️ (Instructions for joining in reply)

thumb_up_off_alt39

chat_bubble_outline2

repeat6

shareShare

Federico Barbero

@fedzbar

7 months ago

Fresh out of the oven 🥖 🍞 — stay tuned 👀 When someone beats you to your own paper announcement lol

thumb_up_off_alt23

chat_bubble_outline1

repeat3

shareShare

Ji-Ha

@ji_ha_kim

7 months ago

LLMs anchor themselves on the first token to dampen and stabilize the interactions on the other tokens. A great explanation of attention sinks with minimal math, and great diagrams!

thumb_up_off_alt428

chat_bubble_outline3

repeat35

shareShare

Alexander Doria

@dorialexander

7 months ago

"Instructions work better at the top of long context". Not going to repeat this thread but prompt engineers should really get better acquainted with the geometry of LLMs.

thumb_up_off_alt1,1K

chat_bubble_outline21

repeat91

shareShare

Federico Barbero

@fedzbar

6 months ago

Super excited to be heading to Singapore tomorrow to present our work on RoPE with Alex, Christos Perivolaropoulos, Razvan, Petar Veličković. Christos and I will be presenting on Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT Hall 3 + Hall 2B #242. Happy to meet and catch up :) DMs are open!

Super excited to be heading to Singapore tomorrow to present our work on RoPE with Alex, <a href="/ccperivol/">Christos Perivolaropoulos</a>, Razvan, <a href="/PetarV_93/">Petar Veličković</a>.

Christos and I will be presenting on Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT Hall 3 + Hall 2B #242.

Happy to meet and catch up :) DMs are open!

thumb_up_off_alt202

chat_bubble_outline2

repeat17

shareShare