Liliang Ren (@liliang_ren) 's Twitter Profile
Liliang Ren

@liliang_ren

Senior Researcher at Microsoft GenAI | UIUC CS PhD graduate | Efficient LLM | NLP | Former Intern @MSFTResearch @Azure @AmazonScience

ID: 1106294591718715392

linkhttps://renll.github.io calendar_today14-03-2019 20:42:39

101 Tweet

2,2K Followers

455 Following

Jize Jiang (@jizejiang) 's Twitter Profile Photo

Excited to introduce VTool-R1! Weโ€™ve trained VLMs to โ€œthink visuallyโ€ using RL, blending Python-based ๐Ÿ–ผ๏ธvisual edits with๐Ÿ’กtextual Chain-of-Thought reasoning. Our trained qwen2.5-VL-32B surpasses GPT-4o on ChartQA & TableVQA, and even the compact qwen2.5-VL-7B significantly

Excited to introduce VTool-R1! 

Weโ€™ve trained VLMs to โ€œthink visuallyโ€ using RL, blending Python-based ๐Ÿ–ผ๏ธvisual edits with๐Ÿ’กtextual Chain-of-Thought reasoning. 
Our trained qwen2.5-VL-32B surpasses GPT-4o on ChartQA & TableVQA, and even the compact qwen2.5-VL-7B significantly
AI21 Labs (@ai21labs) 's Twitter Profile Photo

Attention was never enough. The hybrid LLM era is hereโ€”and itโ€™s moving fast. From Mamba to Jamba to Bamba, we mapped every major model thatโ€™s challenged the Transformer default in the past 18 months. ๐Ÿงต A timeline of whatโ€™s changed and why it matters โ†“ ๐Ÿ”—

Attention was never enough.

The hybrid LLM era is hereโ€”and itโ€™s moving fast.

From Mamba to Jamba to Bamba, we mapped every major model thatโ€™s challenged the Transformer default in the past 18 months.

๐Ÿงต A timeline of whatโ€™s changed and why it matters โ†“

๐Ÿ”—
Feng Yao (@fengyao1909) 's Twitter Profile Photo

Failing on ๐ฅ๐š๐ซ๐ ๐ž-๐ฌ๐œ๐š๐ฅ๐ž ๐‘๐‹ with VeRL? โš ๏ธ Mixing inference backend (๐ฏ๐‹๐‹๐Œ/๐’๐†๐‹๐š๐ง๐ ) with training backends (๐…๐’๐ƒ๐/๐Œ๐ž๐ ๐š๐ญ๐ซ๐จ๐ง) ๐ฌ๐ž๐œ๐ซ๐ž๐ญ๐ฅ๐ฒ ๐ญ๐ฎ๐ซ๐ง๐ฌ ๐ฒ๐จ๐ฎ๐ซ ๐‘๐‹ ๐ข๐ง๐ญ๐จ ๐จ๐Ÿ๐Ÿ-๐ฉ๐จ๐ฅ๐ข๐œ๐ฒ โ€” even if they share the same weights! ๐Ÿ“‰ย Blog:

Failing on ๐ฅ๐š๐ซ๐ ๐ž-๐ฌ๐œ๐š๐ฅ๐ž ๐‘๐‹ with VeRL?

โš ๏ธ Mixing inference backend (๐ฏ๐‹๐‹๐Œ/๐’๐†๐‹๐š๐ง๐ ) with training backends (๐…๐’๐ƒ๐/๐Œ๐ž๐ ๐š๐ญ๐ซ๐จ๐ง) ๐ฌ๐ž๐œ๐ซ๐ž๐ญ๐ฅ๐ฒ ๐ญ๐ฎ๐ซ๐ง๐ฌ ๐ฒ๐จ๐ฎ๐ซ ๐‘๐‹ ๐ข๐ง๐ญ๐จ ๐จ๐Ÿ๐Ÿ-๐ฉ๐จ๐ฅ๐ข๐œ๐ฒ โ€” even if they share the same weights!

๐Ÿ“‰ย Blog:
Mingyuan Wu (@mingyuanwu4) 's Twitter Profile Photo

Can VLMs learn to reason better by drawing on the brilliant thoughts of others. ๐Ÿ”ฅOur recent work on vision language model reasoning, through carefully designed multimodal memory and retrieval, has been accepted to Main Conference of #EMNLP2025. ๐Ÿ’กInspired by case-based

Can VLMs learn to reason better by drawing on the brilliant thoughts of others.

๐Ÿ”ฅOur recent work on vision language model reasoning, through carefully designed multimodal memory and retrieval, has been accepted to Main Conference of #EMNLP2025.

๐Ÿ’กInspired by case-based
Kaiyue Wen (@wen_kaiyue) 's Twitter Profile Photo

(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2ร— speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8ร— Chinchilla)!

(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! &gt;4000 models to find the fastest optimizer! 2ร— speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups &lt;0.5B &amp; only 10% at 1.2B (8ร— Chinchilla)!
Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is โ€œDefeating Nondeterminism in LLM Inferenceโ€ We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is โ€œDefeating Nondeterminism in LLM Inferenceโ€

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
Songlin Yang (@songlinyang4) 's Twitter Profile Photo

Excited to see Gated DeltaNet being adopted in the Qwen series ! It has also previously demonstrated strong effectiveness in NVIDIA's Jet-Nemotron

Eric Jang (@ericjang11) 's Twitter Profile Photo

a well written pedagogical blog post. Some questions out of my curiosity: 1. for stably training large models, why is normalizing weights better than normalizing activations? 2. how much does does regularizing weight matrix to Stiefel manifold limit its expressivity +

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!

Introducing Tinker: a flexible API for fine-tuning language models.

Write training loops in Python on your laptop; we'll run them on distributed GPUs.

Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
Liliang Ren (@liliang_ren) 's Twitter Profile Photo

It is really amazing to see a 5-year-old project finally got wrapped up and still seems very relevant to today's agentic research topics such as multi-agent collaboration, environment simulators and instruction following!

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
Larry Dial (@classiclarryd) 's Twitter Profile Photo

NorMuon from Zichong Li et al. takes the crown as the leading NanoGPT speedrun optimizer! github.com/KellerJordan/mโ€ฆ NorMuon enhances Muon with a neuron normalization step after orthogonalization using second-order statistics. arxiv.org/abs/2510.05491

NorMuon from <a href="/li_zichong/">Zichong Li</a> et al. takes the crown as the leading NanoGPT speedrun optimizer! github.com/KellerJordan/mโ€ฆ
NorMuon enhances Muon with a neuron normalization step after orthogonalization using second-order statistics. arxiv.org/abs/2510.05491
Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

Kimi Linear Tech Report is dropped! ๐Ÿš€ huggingface.co/moonshotai/Kimโ€ฆ Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performanceโ€”ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi

Songlin Yang (@songlinyang4) 's Twitter Profile Photo

Many people are confused by Minimaxโ€™s recent return to full attention - especially since it was the first large-scale pivot toward hybrid linear attention - and by Kimiโ€™s later adoption of hybrid linear variants (as well as earlier attempts by Qwen3-Next, or Qwen3.5). I actually

Seunghyun Seo (@seunghyunseo7) 's Twitter Profile Photo

just noticed modded-nanogpt adopt 'NorMuon' as default (?). it looks like `AdaMuon`. i personally didnt buy this idea because i thought Muon is enough and dont want to introduce optim state for 2nd moment again like adam... hmm arxiv.org/abs/2510.05491 arxiv.org/abs/2507.11005โ€ฆ

just noticed modded-nanogpt adopt 'NorMuon' as default (?).
it looks like `AdaMuon`. i personally didnt buy this idea because i thought Muon is enough and dont want to introduce optim state for 2nd moment again like adam... hmm
arxiv.org/abs/2510.05491 
arxiv.org/abs/2507.11005โ€ฆ