Alessio Devoto (@devoto_alessio) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We visualized the features of 16 SAEs trained on CLIP in collaboration between Fraunhofer HHI and Mila - Institut québécois d'IA! Search thousands of interpretable CLIP features in our vision atlas, with autointerp labels, & scores like clarity and polysemanticity. Some fun features in thread:

thumb_up_off_alt581

chat_bubble_outline9

repeat26

shareShare

Pasquale Minervini is hiring postdocs! 🚀

@pminervini

3 months ago

My amazing collaborators will present several works at ICLR and NAACL later this month -- please catch up with them if you're attending! I tried to summarise our recent work in a blog post: neuralnoise.com/2025/march-res…

thumb_up_off_alt50

chat_bubble_outline0

repeat13

shareShare

Hongru Wang

@wangcarrey

3 months ago

🎉 Thrilled to share our TWO #NAACL2025 oral papers! 👇 Welcome to catch me and talk about anything! 1️⃣ Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering 📅 30 Apr • 11:30–11:45 AM • Ballroom C TLDR: A general representation learning

thumb_up_off_alt48

chat_bubble_outline0

repeat9

shareShare

Yu Zhao

@yuzhaouoe

3 months ago

NAACL 2025 Oral Presentation💥 Our work about using Sparse AutoEncoder to resolve knowledge conflict will present on 30 Apr 11:30–11:45 AM • Ballroom C Thank Hongru for presenting our work!!!

thumb_up_off_alt18

chat_bubble_outline1

repeat10

shareShare

Ne Luo (seeking PhD opportunities)

@neluo19

3 months ago

Hi! I will be attending #NAACL2025 and presenting our paper on self-training for tool-use today, an extended work of my MSc dissertation at EdinburghNLP, supervised by Pasquale Minervini is hiring postdocs! 🚀. Time: 14:00-15:30 Location: Hall 3 Let’s chat and connect!😊

Hi! I will be attending #NAACL2025 and presenting our paper on self-training for tool-use today, an extended work of my MSc dissertation at <a href="/EdinburghNLP/">EdinburghNLP</a>, supervised by <a href="/PMinervini/">Pasquale Minervini is hiring postdocs! 🚀</a>.

Time: 14:00-15:30
Location: Hall 3

Let’s chat and connect!😊

thumb_up_off_alt30

chat_bubble_outline1

repeat8

shareShare

Aryo Pradipta Gema

@aryopg

3 months ago

MMLU-Redux just touched down at #NAACL2025! 🎉 Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅 If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋

thumb_up_off_alt55

chat_bubble_outline0

repeat13

shareShare

Alberto Carlo Maria Mancino

@alberto_mancino

2 months ago

Are you ready to play with us?🎲 Our tutorial D&D4Rec, short for "Standard Practices for Data Processing and Multimodal Feature Extraction in Recommendation with DataRec and Ducho", has been accepted at #RecSys2025 (ACM RecSys) 🥳🥳 More details in the thread 🧵👇

thumb_up_off_alt12

chat_bubble_outline1

repeat4

shareShare

Jary Pomponi

@jarypom

2 months ago

A new paper is out! In collaboration with Alessio Devoto and Simone Scardapane, we tackled catastrophic forgetting in class-incremental learning scenarios via Probability Dampening (self-scaling logit margins) & Cascaded Gated Classifier (sigmoid-gated mini-heads per task)

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

Simone Scardapane

@s_scardapane

2 months ago

*attention is logarithmic, actually* by spike Short & nice blog post on the difference between time complexity and work-depth complexity and how it applies to many neural network operations (e.g., attention). supaiku.com/attention-is-l…

*attention is logarithmic, actually*
by <a href="/spikedoanz/">spike</a>

Short & nice blog post on the difference between time complexity and work-depth complexity and how it applies to many neural network operations (e.g., attention).

supaiku.com/attention-is-l…

thumb_up_off_alt184

chat_bubble_outline2

repeat33

shareShare

Simone Scardapane

@s_scardapane

2 months ago

I have an upcoming PhD course on automatic differentiation and it turns out I am a one-trick pony. 🙃

thumb_up_off_alt793

chat_bubble_outline11

repeat64

shareShare

Simone Scardapane

@s_scardapane

a month ago

Happy to share I just started as associate professor in Sapienza Università di Roma! I have now reached my perfect thermodynamical equilibrium. 😄 Also, ChatGPT's idea of me is way infinitely cooler so I'll leave it here to trick people into giving me money.

Happy to share I just started as associate professor in <a href="/SapienzaRoma/">Sapienza Università di Roma</a>! I have now reached my perfect thermodynamical equilibrium. 😄

Also, ChatGPT's idea of me is way infinitely cooler so I'll leave it here to trick people into giving me money.

thumb_up_off_alt109

chat_bubble_outline12

repeat4

shareShare

Yossi Gandelsman

@ygandelsman

a month ago

Transformers don’t need *trained* registers!

thumb_up_off_alt21

chat_bubble_outline0

repeat1

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

a month ago

Resa: Transparent Reasoning Models via SAEs "Specifically, SAE-Tuning involves two key stages: First, we use an SAE to probe the internal activations of a source model, identifying and extracting a dictionary of latent features that correspond to its reasoning processes. Second,

thumb_up_off_alt165

chat_bubble_outline4

repeat24

shareShare

Sebastian Raschka

@rasbt

a month ago

Feels good to be back coding! Just picked a fun one from my “someday” side project list and finally added a KV cache to the LLMs From Scratch repo: github.com/rasbt/LLMs-fro…

thumb_up_off_alt1,1K

chat_bubble_outline29

repeat115

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

a month ago

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs "we propose LongLLaDA, a training-free method that integrates LLaDA with the NTK-based RoPE extrapolation. Our results validate that established extrapolation scaling laws remain effective for extending the

thumb_up_off_alt214

chat_bubble_outline3

repeat39

shareShare

Sebastian Raschka

@rasbt

a month ago

Understanding and Coding KV Caching From Scratch -- The Extended Edition magazine.sebastianraschka.com/p/coding-the-k…

thumb_up_off_alt762

chat_bubble_outline10

repeat106

shareShare

Unsloth AI

@unslothai

23 days ago

We made a Guide on mastering LoRA Hyperparameters, so you can learn to fine-tune LLMs correctly! Learn to: • Train smarter models with fewer hallucinations • Choose optimal: learning rates, epochs, LoRA rank, alpha • Avoid overfitting & underfitting 🔗docs.unsloth.ai/get-started/fi…

thumb_up_off_alt681

chat_bubble_outline12

repeat129

shareShare

Simone Scardapane

@s_scardapane

21 days ago

Twitter friends, here's some draft notes for my upcoming course on automatic differentiation, mostly based on the "Elements of differentiable programming" book. Let me know what you think! They also include a notebook on operator overloading. 🙃 notion.so/sscardapane/Au…

thumb_up_off_alt25

chat_bubble_outline0

repeat7

shareShare

Alessio Devoto

Gate.io

Sonia

Pasquale Minervini is hiring postdocs! 🚀

Hongru Wang

Yu Zhao

Ne Luo (seeking PhD opportunities)

Aryo Pradipta Gema

Alberto Carlo Maria Mancino

Jary Pomponi

Simone Scardapane

Simone Scardapane

Simone Scardapane

Yossi Gandelsman

Tanishq Mathew Abraham, Ph.D.

Sebastian Raschka

Tanishq Mathew Abraham, Ph.D.

Sebastian Raschka

Unsloth AI

Simone Scardapane