Martin Tutek (@mtutek) 's Twitter Profile
Martin Tutek

@mtutek

Postdoc @ Technion | previously postdoc @ UKP Lab, TU Darmstadt | PhD @ TakeLab, UniZG | Working on interpretability & safety of LLMs.

ID: 4075234643

linkhttp://mttk.github.io calendar_today30-10-2015 12:50:57

428 Tweet

436 Followers

798 Following

Martin Tutek (@mtutek) 's Twitter Profile Photo

🚨🚨 New preprint 🚨🚨 Ever wonder whether CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829

Zachary Bamberger @NAACL2025 (@zacharybamberg1) 's Twitter Profile Photo

1/14 🎉 Excited to announce that our paper, "DEPTH: Discourse Education through Pre-Training Hierarchically", has been accepted to #Rep4NLP at #NAACL2025!!! Joint work with Ofek Glick , Chaim Baskin and Yonatan Belinkov

Zorik Gekhman (@zorikgekhman) 's Twitter Profile Photo

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”? In our new paper, we clearly define this concept and design controlled experiments to test it. 1/🧵

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”?

In our new paper, we clearly define this concept and design controlled experiments to test it.
1/🧵
Michael Toker (@michael_toker) 's Twitter Profile Photo

Check out our new work on how information flows in text-to-image models! Turns out, the text encoder isn’t doing what you’d expect — and that has real consequences for model performance and errors. For a deeper dive, see Guy Kaplan’s post. Paper link is in the first comment!

Martin Tutek (@mtutek) 's Twitter Profile Photo

Very happy to have been a part of this effort to standardize eval in mechanistic interpretability! Plenty of resources in the thread & a lot of collaborators at ICLR/NAACL (they won't be wearing black, though)

Yonatan Belinkov (@boknilev) 's Twitter Profile Photo

BlackboxNLP will be co-located with #EMNLP2025 in Suzhou this November! 📷This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task If you're into mech interp and care about evaluation, please submit!

BlackboxNLP (@blackboxnlp) 's Twitter Profile Photo

BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this November! 📆 This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task

BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this  November! 📆

This edition will feature a new shared task on circuits/causal variable  localization in LMs, details: blackboxnlp.github.io/2025/task
Martin Tutek (@mtutek) 's Twitter Profile Photo

Really cool work introducing a gradient-free method for unlearning organically memorized sensitive information from LMs! (we also curate two datasets of organically memorized sensitive information) Check out the 🧵 below and come talk to us at ACL 2025 in Vienna 🍻

Dana Arad 🎗️ (@dana_arad4) 's Twitter Profile Photo

Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵

Tried steering with SAEs and found that not all features behave as expected?

Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features"  🧵
Nikhil Chandak (@nikhilchandak29) 's Twitter Profile Photo

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯

Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision