Guy Davidson (@guyd33) Twitter Tweets • TwiCopy

Guy Davidson

@guyd33

+ Follow

PhD @NYUDataScience, visiting researcher @AIatMeta, interested in AI & CogSci, specifically in goals and their representations in minds and machines (he/him).

ID: 1117859056817823745

linkhttps://guydavidson.me calendar_today15-04-2019 18:35:42

925 Tweet

968 Followers

1,1K Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Fantastic new work by John (Yueh-Han) Chen (with Brenden Lake and me trying not to cause too much trouble). We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More fun analyses in the paper!

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Sonia

@soniajoseph_

2 months ago

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 #CVPR2025 Mechanistic Interpretability for Vision @ CVPR2025

thumb_up_off_alt288

chat_bubble_outline3

repeat31

shareShare

Guy Davidson

@guyd33

2 months ago

You (yes, you!) should work with Sydney! Either short-term this summer, or longer term at her nascent lab at NYU!

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Guy Davidson

@guyd33

a month ago

Today! Come hear from some wonderful folks about problem solving and design at 1 PM PT / 4 PM ET / 8 PM UTC

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Dr. Karen Ullrich

@karen_ullrich

23 days ago

How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓ We introduce SAMD & SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.

thumb_up_off_alt77

chat_bubble_outline3

repeat12

shareShare

Guy Davidson

@guyd33

23 days ago

Cool new work on localizing and removing concepts using attention heads from colleagues at NYU and Meta!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Guy Davidson

@guyd33

8 days ago

John has some nice new results showing that some frontier models do worse on our safety benchmark than their predecessors. Take a look!

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Guy Davidson

@guyd33

8 days ago

We've been using smile to develop behavioral web experiments in the lab for the last year+. Everything from the simplest survey-like judgment collections to complex game-like designs (e.g., exps.gureckislab.org/e/laugh-melted…) is easier to develop and deploy. Consider it for your next exp!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare