Denis Sutter (@denissutte9310) 's Twitter Profile
Denis Sutter

@denissutte9310

Msc at @eth interested in ML interpretability

ID: 1943332976580005888

calendar_today10-07-2025 15:36:04

44 Tweet

10 Followers

9 Following

Tiago Pimentel (@tpimentelms) 's Twitter Profile Photo

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵