Dmitrii Kharlapenko (@dmhook) 's Twitter Profile
Dmitrii Kharlapenko

@dmhook

ID: 1715060484531970048

calendar_today19-10-2023 17:41:17

8 Tweet

120 Followers

22 Following

Dmitrii Kharlapenko (@dmhook) 's Twitter Profile Photo

We use LLM’s capabilities to explain concepts from their minds in my and nev abstract SAE features research. Excited to continue our MATS 6.0 work under the mentorship of Neel Nanda and Arthur Conmy . More cool stuff to come! lesswrong.com/posts/8ev6coxC…

Dmitrii Kharlapenko (@dmhook) 's Twitter Profile Photo

How interpretable are task vectors? Using our new task vector cleaning method we find SAE features responsible for detecting and encoding specific ICL tasks. See details in our second MATS 6.0 post with nev, Neel Nanda and Arthur Conmy. lesswrong.com/posts/5FGXmJ3w…

nev (@neverrixx) 's Twitter Profile Photo

🧵1/6 SAEs have become a staple of LLM interpretability, but what if we applied them to image generation models? My recent paper with Dmitrii Kharlapenko, Yixiong Hao, afterless, Sheikh Abdur Raheem Ali, and Arthur Conmy adapts SAEs to understand the SOTA diffusion transformer FLUX.1 ⬇️

🧵1/6 SAEs have become a staple of LLM interpretability, but what if we applied them to image generation models?
My recent paper with <a href="/dmhook/">Dmitrii Kharlapenko</a>, <a href="/Yixiong_Hao/">Yixiong Hao</a>, <a href="/afterlxss/">afterless</a>, <a href="/Sheikheddy/">Sheikh Abdur Raheem Ali</a>, and <a href="/ArthurConmy/">Arthur Conmy</a> adapts SAEs to understand the SOTA diffusion transformer FLUX.1 ⬇️