Martin Tutek (@mtutek) Twitter Tweets • TwiCopy

Martin Tutek

@mtutek

+ Follow

Postdoc @ Technion | previously postdoc @ UKP Lab, TU Darmstadt | PhD @ TakeLab, UniZG | Working on interpretability & safety of LLMs.

ID: 4075234643

linkhttp://mttk.github.io calendar_today30-10-2015 12:50:57

428 Tweet

436 Followers

798 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🚨🚨 New preprint 🚨🚨 Ever wonder whether CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829

thumb_up_off_alt47

chat_bubble_outline2

repeat9

shareShare

Zachary Bamberger @NAACL2025

@zacharybamberg1

4 months ago

1/14 🎉 Excited to announce that our paper, "DEPTH: Discourse Education through Pre-Training Hierarchically", has been accepted to #Rep4NLP at #NAACL2025!!! Joint work with Ofek Glick , Chaim Baskin and Yonatan Belinkov

thumb_up_off_alt19

chat_bubble_outline1

repeat7

shareShare

Zorik Gekhman

@zorikgekhman

3 months ago

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”? In our new paper, we clearly define this concept and design controlled experiments to test it. 1/🧵

thumb_up_off_alt221

chat_bubble_outline4

repeat59

shareShare

Hadas Orgad

@orgadhadas

3 months ago

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉 >> Follow Actionable Interpretability Workshop ICML2025 Tal Haklay Anja Reusch Marius Mosbach Sarah Wiegreffe Ian Tenney (@[email protected]) Mor Geva Paper submission deadline: May 9th!

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
>> Follow <a href="/ActInterp/">Actionable Interpretability Workshop ICML2025</a>

<a href="/tal_haklay/">Tal Haklay</a> <a href="/anja_reu/">Anja Reusch</a> <a href="/mariusmosbach/">Marius Mosbach</a> <a href="/sarahwiegreffe/">Sarah Wiegreffe</a> <a href="/iftenney/">Ian Tenney (@iftenney@sigmoid.social)</a> <a href="/megamor2/">Mor Geva</a>

Paper submission deadline: May 9th!

thumb_up_off_alt127

chat_bubble_outline1

repeat25

shareShare

Michael Toker

@michael_toker

3 months ago

Check out our new work on how information flows in text-to-image models! Turns out, the text encoder isn’t doing what you’d expect — and that has real consequences for model performance and errors. For a deeper dive, see Guy Kaplan’s post. Paper link is in the first comment!

thumb_up_off_alt23

chat_bubble_outline1

repeat7

shareShare

Martin Tutek

@mtutek

3 months ago

Very happy to have been a part of this effort to standardize eval in mechanistic interpretability! Plenty of resources in the thread & a lot of collaborators at ICLR/NAACL (they won't be wearing black, though)

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Ana Marasović

@anmarasovic

2 months ago

Slides available here: docs.google.com/presentation/d…

thumb_up_off_alt56

chat_bubble_outline0

repeat11

shareShare

Yonatan Belinkov

@boknilev

2 months ago

BlackboxNLP will be co-located with #EMNLP2025 in Suzhou this November! 📷This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task If you're into mech interp and care about evaluation, please submit!

thumb_up_off_alt71

chat_bubble_outline1

repeat20

shareShare

BlackboxNLP

@blackboxnlp

2 months ago

BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this November! 📆 This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task

thumb_up_off_alt18

chat_bubble_outline1

repeat5

shareShare

Martin Tutek

@mtutek

2 months ago

Really cool work introducing a gradient-free method for unlearning organically memorized sensitive information from LMs! (we also curate two datasets of organically memorized sensitive information) Check out the 🧵 below and come talk to us at ACL 2025 in Vienna 🍻

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Dana Arad 🎗️

@dana_arad4

2 months ago

Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵

thumb_up_off_alt166

chat_bubble_outline7

repeat32

shareShare

Nikhil Chandak

@nikhilchandak29

11 days ago

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision

thumb_up_off_alt62

chat_bubble_outline3

repeat18

shareShare