
Dmitrii Kharlapenko
@dmhook
ID: 1715060484531970048
19-10-2023 17:41:17
8 Tweet
120 Followers
22 Following

We use LLM’s capabilities to explain concepts from their minds in my and nev abstract SAE features research. Excited to continue our MATS 6.0 work under the mentorship of Neel Nanda and Arthur Conmy . More cool stuff to come! lesswrong.com/posts/8ev6coxC…

How interpretable are task vectors? Using our new task vector cleaning method we find SAE features responsible for detecting and encoding specific ICL tasks. See details in our second MATS 6.0 post with nev, Neel Nanda and Arthur Conmy. lesswrong.com/posts/5FGXmJ3w…

🧵1/6 SAEs have become a staple of LLM interpretability, but what if we applied them to image generation models? My recent paper with Dmitrii Kharlapenko, Yixiong Hao, afterless, Sheikh Abdur Raheem Ali, and Arthur Conmy adapts SAEs to understand the SOTA diffusion transformer FLUX.1 ⬇️
