Deqing Fu (@deqingfu) 's Twitter Profile
Deqing Fu

@deqingfu

CS PhD Student @CSatUSC. Alum @UChicago, B.S. '20, M.S.' 22. Interpretability of LLM; DL Theory; NLP | prev research intern @MetaAI, @Google

ID: 1327029576430718976

linkhttp://deqingfu.github.io calendar_today12-11-2020 23:24:49

143 Tweet

737 Followers

844 Following

Yuqing Yang (@yyqcode) 's Twitter Profile Photo

🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵

🧐When do LLMs admit their mistakes when they should know better?

In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong.
LLMs can retract—but they rarely do.🤯

arxiv.org/abs/2505.16170

👇🧵
xiao zhang (@xiaozha55937919) 's Twitter Profile Photo

Excited to be at #CVPR2025 2025! Looking forward to catching up with old friends and meeting new ones. If you're interested in grabbing coffee, trying out new restaurants, or chatting about generative representation learning, feel free to DM me! A quick summary of my recent works:

Excited to be at <a href="/CVPR/">#CVPR2025</a> 2025! Looking forward to catching up with old friends and meeting new ones.

If you're interested in grabbing coffee, trying out new restaurants, or chatting about generative representation learning, feel free to DM me!

A quick summary of my recent works:
Shangshang Wang (@upupwang) 's Twitter Profile Photo

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency.

Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.
Deqing Fu (@deqingfu) 's Twitter Profile Photo

How to make SAEs useful beyond interpretability and steering? Shangshang Wang 's work Resa shows: 🧐SAEs can capture the reasoning features (as an interpretability tool) 🤔SAEs can further elicit strong reasoning abilities via SAE-tuning the model (stronger claim than steering, imho)

Qinyuan Ye (👀Jobs) (@qinyuan_ye) 's Twitter Profile Photo

1+1=3 2+2=5 3+3=? Many language models (e.g., Llama 3 8B, Mistral v0.1 7B) will answer 7. But why? We dig into the model internals, uncover a function induction mechanism, and find that it’s broadly reused when models encounter surprises during in-context learning. 🧵

1+1=3
2+2=5
3+3=?

Many language models (e.g., Llama 3 8B, Mistral v0.1 7B) will answer 7. But why?

We dig into the model internals, uncover a function induction mechanism, and find that it’s broadly reused when models encounter surprises during in-context learning. 🧵
Micah Goldblum (@micahgoldblum) 's Twitter Profile Photo

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜.  Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n
Deqing Fu (@deqingfu) 's Twitter Profile Photo

Presenting Zebra-CoT: A large-scale dataset to teach models intrinsic multimodal reasoning: interleaving text and natively-generated images like a zebra's stripes. It moves beyond the limitations of external tool-based visual CoT. 🔗arxiv.org/abs/2507.16746

Presenting Zebra-CoT: A large-scale dataset to teach models intrinsic multimodal reasoning: interleaving text and natively-generated images like a zebra's stripes. It moves beyond the limitations of external tool-based visual CoT.

🔗arxiv.org/abs/2507.16746
Robin Jia (@robinomial) 's Twitter Profile Photo

I’ll be at ACL 2025 next week where my group has papers on evaluating evaluation metrics, watermarking training data, and mechanistic interpretability. I’ll also be co-organizing the first Workshop on LLM Memorization Workshop on Large Language Model Memorization on Friday. Hope to see lots of folks there!