
Deqing Fu
@deqingfu
CS PhD Student @CSatUSC. Alum @UChicago, B.S. '20, M.S.' 22. Interpretability of LLM; DL Theory; NLP | prev research intern @MetaAI, @Google
ID: 1327029576430718976
http://deqingfu.github.io 12-11-2020 23:24:49
143 Tweet
737 Followers
844 Following





How to make SAEs useful beyond interpretability and steering? Shangshang Wang 's work Resa shows: 🧐SAEs can capture the reasoning features (as an interpretability tool) 🤔SAEs can further elicit strong reasoning abilities via SAE-tuning the model (stronger claim than steering, imho)





I’ll be at ACL 2025 next week where my group has papers on evaluating evaluation metrics, watermarking training data, and mechanistic interpretability. I’ll also be co-organizing the first Workshop on LLM Memorization Workshop on Large Language Model Memorization on Friday. Hope to see lots of folks there!