TimDarcet (@timdarcet) Twitter Tweets • TwiCopy

TimDarcet

@timdarcet

6 months ago

Rumor is 2>1 For clarifications ask Peano

thumb_up_off_alt10

chat_bubble_outline1

repeat0

shareShare

Ok there's a new paper in my top 3 favorites Vision transformers need registers Clear problem, elegant solution, well written, easy to understand, good results, limitations included. No fancy losses or layers. No equation (at all!) Here's a short summary: (1/4)

thumb_up_off_alt1,1K

chat_bubble_outline8

repeat100

shareShare

TimDarcet

@timdarcet

6 months ago

I also view layernorm as hyperplane proj + hypersphere proj Hyperplane proj makes no sense, hence we do RMSnorm now Although don't forget the epsilon. We project onto the hyper*ball* actually

thumb_up_off_alt63

chat_bubble_outline0

repeat3

shareShare

TimDarcet

@timdarcet

6 months ago

Summary of "Massive activations in LLMs": - "artifact" tokens are in all transformers, ViTs and LLMs - their weirdness is ~only on 1 channel - they are the same as the quantization outliers - their purpose is *not* global information - there's a fix simpler than registers

thumb_up_off_alt117

chat_bubble_outline4

repeat8

shareShare

TimDarcet

@timdarcet

6 months ago

Oh these plots are great too They fit with my observations on norms of different things

thumb_up_off_alt14

chat_bubble_outline0

repeat0

shareShare

TimDarcet

@timdarcet

6 months ago

Is there a good reason we use softmax losses in contrastive learning, instead of just doing MSE? ie L = ||xi-xi'||² - lambda sum_k ||xi-xk'||² I'd guess the optimization dynamics are maybe friendlier, but does anyone have a good pointer? Both for CLIP and SSL btw

thumb_up_off_alt488

chat_bubble_outline29

repeat22

shareShare

François Fleuret

@francoisfleuret

6 months ago

Two things teach intellectual humility: people smarter than you, and maths. Doing math with people smarter than you is sort of a bit too much.

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat169

shareShare

TimDarcet

@timdarcet

6 months ago

So the reason I was asking about this is because the squared L2 has the very pleasant property of reducing to "just push away from the avg" and that would eliminate all batch size issues (you an use an EMA avg) It's basically what DINO does, w/ softmax+CE loss instead of L2

thumb_up_off_alt30

chat_bubble_outline2

repeat0

shareShare

Nick Jiang @ ICLR

@nickhjiang

5 months ago

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

thumb_up_off_alt995

chat_bubble_outline15

repeat134

shareShare

TimDarcet

@timdarcet

5 months ago

Great summary of dino.txt by Fede! Drop by the poster if you're at CVPR! 📅 Sunday, June 15 🕥 10:30 - 12:30 📍 Poster 370

thumb_up_off_alt29

chat_bubble_outline0

repeat0

shareShare

TimDarcet

@timdarcet

4 months ago

FFS Hugging Face please stop doing that it makes you look like pretentious assholes

thumb_up_off_alt20

chat_bubble_outline0

repeat0

shareShare

TimDarcet

@timdarcet

4 months ago

In case there is any ambiguity: DINOv2 is 100% a product of dumb hill-climbing on ImageNet-1k knn accuracy (and linear too) Overfitting an eval can be bad. But sometimes the reward signal is reliable, and leads to truly good models. It's about finding a balance

thumb_up_off_alt199

chat_bubble_outline8

repeat12

shareShare

tenderizzation

@tenderizzation

4 months ago

I already know who Jiahui is you don’t have to tell me

thumb_up_off_alt199

chat_bubble_outline3

repeat2

shareShare

TimDarcet

@timdarcet

4 months ago

Hey I'm a doctor now, neat

thumb_up_off_alt201

chat_bubble_outline33

repeat1

shareShare

Ahmad Mustafa Anis

@ahmadmustafaan1

4 months ago

~400 people have joined us on Sunday at Cohere Labs Open Science Community ML Summer School. TimDarcet as always, delivering a super amazing talk on Scaling Self Supervised Learning (SSL, Dinov2, Masked Image Modeling, CAPI) Super interesting session.

~400 people have joined us on Sunday at <a href="/Cohere_Labs/">Cohere Labs</a> Open Science Community ML Summer School.
<a href="/TimDarcet/">TimDarcet</a> as always, delivering a super amazing talk on Scaling Self Supervised Learning (SSL, Dinov2, Masked Image Modeling, CAPI)
Super interesting session.

thumb_up_off_alt174

chat_bubble_outline4

repeat20

shareShare

Piotr Bojanowski

@p_bojanowski

4 months ago

Why does Meta open-source its models? I talked about it with Maciej Kawecki - This Is IT looking at Dino, our computer vision model with applications in forest mapping, medical research, agriculture and more. Open-source boosts AI access, transparency, and safety. youtube.com/watch?v=eNGafi…

thumb_up_off_alt60

chat_bubble_outline0

repeat10

shareShare

TimDarcet

TimDarcet

Gabriele Berton

TimDarcet

TimDarcet

TimDarcet

TimDarcet

François Fleuret

TimDarcet

Nick Jiang @ ICLR

TimDarcet

TimDarcet

TimDarcet

tenderizzation

TimDarcet

Ahmad Mustafa Anis

Piotr Bojanowski