TimDarcet (@timdarcet) 's Twitter Profile
TimDarcet

@timdarcet

PhD student, building big vision models @ INRIA & FAIR (Meta)

ID: 1371396662925606913

calendar_today15-03-2021 09:44:31

982 Tweet

3,3K Followers

728 Following

Gabriele Berton (@gabriberton) 's Twitter Profile Photo

Ok there's a new paper in my top 3 favorites Vision transformers need registers Clear problem, elegant solution, well written, easy to understand, good results, limitations included. No fancy losses or layers. No equation (at all!) Here's a short summary: (1/4)

Ok there's a new paper in my top 3 favorites 

Vision transformers need registers

Clear problem, elegant solution, well written, easy to understand, good results, limitations included.

No fancy losses or layers. No equation (at all!)

Here's a short summary: (1/4)
TimDarcet (@timdarcet) 's Twitter Profile Photo

I also view layernorm as hyperplane proj + hypersphere proj Hyperplane proj makes no sense, hence we do RMSnorm now Although don't forget the epsilon. We project onto the hyper*ball* actually

TimDarcet (@timdarcet) 's Twitter Profile Photo

Summary of "Massive activations in LLMs": - "artifact" tokens are in all transformers, ViTs and LLMs - their weirdness is ~only on 1 channel - they are the same as the quantization outliers - their purpose is *not* global information - there's a fix simpler than registers

TimDarcet (@timdarcet) 's Twitter Profile Photo

Is there a good reason we use softmax losses in contrastive learning, instead of just doing MSE? ie L = ||xi-xi'||² - lambda sum_k ||xi-xk'||² I'd guess the optimization dynamics are maybe friendlier, but does anyone have a good pointer? Both for CLIP and SSL btw

François Fleuret (@francoisfleuret) 's Twitter Profile Photo

Two things teach intellectual humility: people smarter than you, and maths. Doing math with people smarter than you is sort of a bit too much.

TimDarcet (@timdarcet) 's Twitter Profile Photo

So the reason I was asking about this is because the squared L2 has the very pleasant property of reducing to "just push away from the avg" and that would eliminate all batch size issues (you an use an EMA avg) It's basically what DINO does, w/ softmax+CE loss instead of L2

Nick Jiang @ ICLR (@nickhjiang) 's Twitter Profile Photo

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵
TimDarcet (@timdarcet) 's Twitter Profile Photo

Great summary of dino.txt by Fede! Drop by the poster if you're at CVPR! 📅 Sunday, June 15 🕥 10:30 - 12:30 📍 Poster 370

TimDarcet (@timdarcet) 's Twitter Profile Photo

In case there is any ambiguity: DINOv2 is 100% a product of dumb hill-climbing on ImageNet-1k knn accuracy (and linear too) Overfitting an eval can be bad. But sometimes the reward signal is reliable, and leads to truly good models. It's about finding a balance

Ahmad Mustafa Anis (@ahmadmustafaan1) 's Twitter Profile Photo

~400 people have joined us on Sunday at Cohere Labs Open Science Community ML Summer School. TimDarcet as always, delivering a super amazing talk on Scaling Self Supervised Learning (SSL, Dinov2, Masked Image Modeling, CAPI) Super interesting session.

~400 people have joined us on Sunday at <a href="/Cohere_Labs/">Cohere Labs</a> Open Science Community ML Summer School.
<a href="/TimDarcet/">TimDarcet</a> as always, delivering a super amazing talk on Scaling Self Supervised Learning (SSL, Dinov2, Masked Image Modeling, CAPI)
Super interesting session.
Piotr Bojanowski (@p_bojanowski) 's Twitter Profile Photo

Why does Meta open-source its models? I talked about it with Maciej Kawecki - This Is IT looking at Dino, our computer vision model with applications in forest mapping, medical research, agriculture and more. Open-source boosts AI access, transparency, and safety. youtube.com/watch?v=eNGafi…