Guy Davidson (@guyd33) 's Twitter Profile
Guy Davidson

@guyd33

PhD @NYUDataScience, visiting researcher @AIatMeta, interested in AI & CogSci, specifically in goals and their representations in minds and machines (he/him).

ID: 1117859056817823745

linkhttps://guydavidson.me calendar_today15-04-2019 18:35:42

925 Tweet

968 Followers

1,1K Following

Guy Davidson (@guyd33) 's Twitter Profile Photo

Fantastic new work by John (Yueh-Han) Chen (with Brenden Lake and me trying not to cause too much trouble). We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More fun analyses in the paper!

Sonia (@soniajoseph_) 's Twitter Profile Photo

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 #CVPR2025 Mechanistic Interpretability for Vision @ CVPR2025

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉

We’ll be in Nashville next week. Come say hi 👋

<a href="/CVPR/">#CVPR2025</a>  <a href="/miv_cvpr2025/">Mechanistic Interpretability for Vision @ CVPR2025</a>
Dr. Karen Ullrich (@karen_ullrich) 's Twitter Profile Photo

How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓ We introduce SAMD & SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.

How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓

We introduce SAMD &amp; SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.
Guy Davidson (@guyd33) 's Twitter Profile Photo

John has some nice new results showing that some frontier models do worse on our safety benchmark than their predecessors. Take a look!

Guy Davidson (@guyd33) 's Twitter Profile Photo

We've been using smile to develop behavioral web experiments in the lab for the last year+. Everything from the simplest survey-like judgment collections to complex game-like designs (e.g., exps.gureckislab.org/e/laugh-melted…) is easier to develop and deploy. Consider it for your next exp!