Liyuan Liu (Lucas) (@liyuanlucas) 's Twitter Profile
Liyuan Liu (Lucas)

@liyuanlucas

Researcher @MSFTResearch | prev. @dmguiuc
Working on deep learning heuristics (aka tricks)

He/him

ID: 3745471758

linkhttps://liyuanlucasliu.github.io calendar_today01-10-2015 07:32:59

144 Tweet

802 Followers

502 Following

Yufan Zhuang (@yufan_zhuang) 's Twitter Profile Photo

Can LLMs reason beyond context limits? 🤔 Introducing Knowledge Flow, a training-free method that helped gpt-oss-120b & Qwen3-235B achieve 100% on the AIME-25, no tools. How? like human deliberation, for LLMs. 📝 Blog: yufanzhuang.notion.site/knowledge-flow 💻 Code: github.com/EvanZhuang/kno…

Can LLMs reason beyond context limits? 🤔 

Introducing Knowledge Flow, a training-free method that helped gpt-oss-120b & Qwen3-235B achieve 100% on the AIME-25, no tools.

How? like human deliberation, for LLMs.

📝 Blog: yufanzhuang.notion.site/knowledge-flow
💻 Code: github.com/EvanZhuang/kno…
Dinghuai Zhang 张鼎怀 (@zdhnarsil) 's Twitter Profile Photo

Check our Knowledge Flow blog: We develop a new axis of test-time scaling by doing iterative refinement on a "knowledge" list for reasoning tasks! Notably, we find that updating what is wrong is more effective than recording what is right. Great job led by Yufan Zhuang!

Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

got confused by something basic and went down a rabbit hole, so I just wrote a blogpost about it. "Is Density vs. Feature Coverage That Different?" nanjiang.cs.illinois.edu/2025/10/24/cov…

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Today we’re announcing research and teaching grants for Tinker: credits for scholars and students to fine-tune and experiment with open-weight LLMs. Read more and apply at: thinkingmachines.ai/blog/tinker-re…

Liyuan Liu (Lucas) (@liyuanlucas) 's Twitter Profile Photo

this would be a very good starting point for learning / prototyping many times people interested in learning ML/DL/LLM are intimidated by the sys/compute complexity

Penghui Qi (@qphutu) 's Twitter Profile Photo

🚀Excited to share our new work! 💊Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training. 💡Solution: Just switch to FP16. 🎯That's it. 📰Paper: arxiv.org/pdf/2510.26788 ⭐️Code: github.com/sail-sg/Precis…

🚀Excited to share our new work!

💊Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training.

💡Solution: Just switch to FP16.

🎯That's it.

📰Paper: arxiv.org/pdf/2510.26788
⭐️Code: github.com/sail-sg/Precis…
Yingru Li (@richardyrli) 's Twitter Profile Photo

Daniel Han, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog. The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug

<a href="/danielhanchen/">Daniel Han</a>, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog. 
The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug
Penghui Qi (@qphutu) 's Twitter Profile Photo

Yingru Li Daniel Han Hi Yingru Li , I tried this disable_cascade_attn many times, including the latest vllm version. But unfortunately it made no difference in our experiments. So I guess it really depends on the setting.

Zongyi Li (@zongyilicaltech) 's Twitter Profile Photo

Life update: I will join NYU Courant Math (CAOS) and Center of Data Science as an assistant professor in Fall 2026. If you are interested in doing a PhD with me please let me know! zongyi-li.github.io

Shekswess (@shekswess) 's Twitter Profile Photo

It’s frustrating how labs like Kimi.ai, Qwen, DeepSeek, Ai2, Hugging Face... share their research, pipelines, and lessons openly, only for closed-source labs to quietly use that knowledge to build better models without ever giving back.

Ashwinee Panda (@pandaashwinee) 's Twitter Profile Photo

"Dense Backpropagation Improves Pretraining for sMoEs" is accepted at NeurIPS Conference! We show that we can proxy inactive experts with a cheap estimator, and that doing this in pretraining improves performance without requiring HPO or compute overhead.

"Dense Backpropagation Improves Pretraining for sMoEs" is accepted at <a href="/NeurIPSConf/">NeurIPS Conference</a>! We show that we can proxy inactive experts with a cheap estimator, and that doing this in pretraining improves performance without requiring HPO or compute overhead.
Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile Photo

RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵

RL is bounded by finite data😣?
Introducing RLVE: RL with Adaptive Verifiable Environments

We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model

💡find supervision signals right at the LM capability frontier + scale them

🔗in🧵
vLLM (@vllm_project) 's Twitter Profile Photo

🚀 No More Train–Inference Mismatch! We demonstrate bitwise consistent on-policy RL with TorchTitan (training) + vLLM (inference) — the first open-source run where training and inference numerics match exactly. It only takes 3 steps: 1️⃣ Make vLLM batch-invariant (same seq →

Satya Nadella (@satyanadella) 's Twitter Profile Photo

I’ve been thinking a lot about what the net benefit of the AI platform wave is. The real question is how to empower every company out there to get more out of this platform shift and build their own AI native capabilities and enterprise value (vs inadvertently just transfer their