Sarah Wiegreffe (on faculty job market!) (@sarahwiegreffe) 's Twitter Profile
Sarah Wiegreffe (on faculty job market!)

@sarahwiegreffe

Research in language model explainability & interpretability since 2017. Postdoc @allen_ai @uwnlp PhD from @mlatgt @gtcomputing Views my own, not my employer's.

ID: 1882939814

linkhttp://sarahwie.github.io calendar_today19-09-2013 12:05:27

1,1K Tweet

4,4K Followers

1,1K Following

Sarah Wiegreffe (on faculty job market!) (@sarahwiegreffe) 's Twitter Profile Photo

Checkout our new preprint/project which has been over a year in the making! This has been a very fun collaboration (and one of the biggest I've personally participated in). We are quite excited about the leaderboard and release, and are open to feedback to help this remain a

Abhilasha Ravichander (@lasha_nlp) 's Twitter Profile Photo

Stoked that HALoGEN (non-archival version) won best paper award at the TrustNLP workshop @ #NAACL2025! Our work explores LLM hallucinations and their potential roots in training data. Excited to discuss more --- come find us!

Yonatan Belinkov (@boknilev) 's Twitter Profile Photo

Since people have been asking - the #blackboxNLP workshop will return this year, to be held with #emnlp2025. This workshop is all about interpreting and analyzing NLP models (and yes, this includes LLMs). More details soon, follow BlackboxNLP

Hadas Orgad (@orgadhadas) 's Twitter Profile Photo

Just 6 days left! ⏰ Submit your work to the Actionable Interpretability Workshop at #ICML2025 by May 19th. Contribute to the future of interpretable and impactful AI! Actionable Interpretability Workshop ICML2025

Just 6 days left! ⏰ Submit your work to the Actionable Interpretability Workshop at #ICML2025 by May 19th. Contribute to the future of interpretable and impactful AI!
<a href="/ActInterp/">Actionable Interpretability Workshop ICML2025</a>
Tal Haklay (@tal_haklay) 's Twitter Profile Photo

We knew many of you wanted to submit to our Actionable Interpretability workshop, but we didn’t expect to crash Overleaf! 😏🍃 Only 5 days left ⏰! Got a paper accepted to ICML that fits our theme? Submit it to our conference track! 👉 Actionable Interpretability Workshop ICML2025

We knew many of you wanted to submit to our Actionable Interpretability workshop, but we didn’t expect to crash Overleaf! 😏🍃

Only 5 days left ⏰!
Got a paper accepted to ICML that fits our theme?
Submit it to our conference track!
👉 <a href="/ActInterp/">Actionable Interpretability Workshop ICML2025</a>
Yonatan Belinkov (@boknilev) 's Twitter Profile Photo

BlackboxNLP will be co-located with #EMNLP2025 in Suzhou this November! 📷This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task If you're into mech interp and care about evaluation, please submit!

Sarah Wiegreffe (on faculty job market!) (@sarahwiegreffe) 's Twitter Profile Photo

We got more submissions to the workshop than we anticipated, and are looking for reviewers willing to review 2-4 papers between May 24 and June 7. If you are interested, please self-nominate! Thank you 🙏 docs.google.com/forms/d/e/1FAI…

Sarah Wiegreffe (on faculty job market!) (@sarahwiegreffe) 's Twitter Profile Photo

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland UMD Department of Computer Science this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland <a href="/umdcs/">UMD Department of Computer Science</a> this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Good news! There will be a mechanistic interpretability workshop at NeurIPS (Dec 6/7, San Diego) If you were disappointed that ICML rejected us, now we'll do an even better one: 4 more months of progress to discuss! Papers likely due late August/early Sept, more info soon

Mor Geva (@megamor2) 's Twitter Profile Photo

Going to #icml2025? Don't miss the Actionable Interpretability Workshop (Actionable Interpretability Workshop ICML2025)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨

Going to #icml2025? Don't miss the Actionable Interpretability Workshop (<a href="/ActInterp/">Actionable Interpretability Workshop ICML2025</a>)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨
Tal Haklay (@tal_haklay) 's Twitter Profile Photo

🚨Meet our panelists at the Actionable Interpretability Workshop Actionable Interpretability Workshop ICML2025 at ICML Conference! Join us July 19 at 4pm for a panel on making interpretability research actionable, its challenges, and how the community can drive greater impact. Naomi Saphra hiring my lab at ICML 🧈🪰 Samuel Marks Kyle Lo Fazl Barez

🚨Meet our panelists at the Actionable Interpretability Workshop <a href="/ActInterp/">Actionable Interpretability Workshop ICML2025</a> at <a href="/icmlconf/">ICML Conference</a>!

Join us July 19 at 4pm for a panel on making interpretability research actionable, its challenges, and how the community can drive greater impact.
<a href="/nsaphra/">Naomi Saphra hiring my lab at ICML 🧈🪰</a> <a href="/saprmarks/">Samuel Marks</a> <a href="/kylelostat/">Kyle Lo</a> <a href="/FazlBarez/">Fazl Barez</a>
Samuel Marks (@saprmarks) 's Twitter Profile Photo

I'm excited to discuss downstream applications of interpretability at Actionable Interpretability Workshop ICML2025! For a preview of my thoughts on the topic, see my blog post on how I think about picking applications to target x.com/saprmarks/stat…