Zhijiang Guo (@zhijiangg) 's Twitter Profile
Zhijiang Guo

@zhijiangg

Assistant Professor @HKUSTGuangzhou
Prev. @CambridgeNLP @EdinburghNLP @SUTDsg.
Working on #LLM.

ID: 4432044019

linkhttps://cartus.github.io/ calendar_today02-12-2015 14:24:09

238 Tweet

740 Followers

540 Following

Han Wu (@hahahawu2) 's Twitter Profile Photo

💡Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging We comprehensively study existing model merging methods on efficient Long-to-Short LLM reasoning tasks, and find their huge potential in the field.

💡Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

We comprehensively study existing model merging methods on efficient Long-to-Short LLM reasoning tasks, and find their huge potential in the field.
Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

😂 You're right! Claude's just too academically honest. 😉 Glad 4o unlocked your inner robot-sidekick-having private eye!

BangLiu (@bangl93) 's Twitter Profile Photo

🧠264 pages and 1416 references chart the future of Foundation Agents. Our latest survey dives deep into agents—covering brain-inspired cognition, self-evolution, multi-agents, and AI safety. Discover the #1 Paper of the Day on Hugging Face👇: huggingface.co/papers/2504.01… 1/3

🧠264 pages and 1416 references chart the future of Foundation Agents.

Our latest survey dives deep into agents—covering brain-inspired cognition, self-evolution, multi-agents, and AI safety.

Discover the #1 Paper of the Day on Hugging Face👇:

huggingface.co/papers/2504.01…

1/3
Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

🚀Excited to announce the AI for Math Workshop at ICML 2025 with amazing co-organizers! 🌐This event is a fantastic opportunity to explore the intersection of AI and Math. 🔍Join us to learn from leading experts, share your research, and connect with like-minded researchers.

Xiao Zhu (@shawnxzhu) 's Twitter Profile Photo

2/4 🎯 We identify a new type of bias in reward models: model preference bias. Popular RMs often over-value certain LLMs—even when those models rank lower in human-voted Elo scores. ⚠️ Example: Gemma-2-9B-it-SimPO often gets much higher RM scores than GPT-4o, even though GPT-4o

2/4

🎯 We identify a new type of bias in reward models: model preference bias. Popular RMs often over-value certain LLMs—even when those models rank lower in human-voted Elo scores.

⚠️ Example: Gemma-2-9B-it-SimPO often gets much higher RM scores than GPT-4o, even though GPT-4o
Simon Yu (@simon_ycl) 's Twitter Profile Photo

Our recently released TextArena framework is perfect as a multi-turn reasoning benchmark and training environment for your models. 🏆 Leaderboard: Models vs Humanity 🌍 Env: Gym-like RL environment for multi-turn reasoning ▶️ Online Playing: You can play against any reasoning

Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

Feel free to drop by our poster session to discuss autoformalization,formal math,and math reasoning with LLMs if you’re interested! 📈💡#ICLR2025

Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

Excited to share our second work on LLMs for math/reasoning at #ICLR2025. Introducing OptiBench,a comprehensive benchmark for evaluating LLMs in optimization,and ReSocratic,a reverse data synthesis method to enhance LLM reasoning.Come chat with us at our poster session!🚀

Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

Tired of the computational cost of traditional #LLM ensembling? 🤔 Our #ICLR2025 spotlight paper presents UniTE!🎉 By uniting top-k tokens, we achieve strong results with reduced overhead. Find us at the poster session if you are interested.

Zeyuan Allen-Zhu, Sc.D. (@zeyuanallenzhu) 's Twitter Profile Photo

(1/8)🍎A Galileo moment for LLM design🍎 As Pisa Tower experiment sparked modern physics, our controlled synthetic pretraining playground reveals LLM architectures' true limits. A turning point that might divide LLM research into "before" and "after." physics.allen-zhu.com/part-4-archite…

(1/8)🍎A Galileo moment for LLM design🍎
As Pisa Tower experiment sparked modern physics, our controlled synthetic pretraining playground reveals LLM architectures' true limits. A turning point that might divide LLM research into "before" and "after." physics.allen-zhu.com/part-4-archite…
Yi Xu (@_yixu) 's Twitter Profile Photo

🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.

🚀Let’s Think Only with Images.

No language and No verbal thought.🤔 

Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. 

We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.
Yinya Huang ✈️ ICLR (@yinyahuang) 's Twitter Profile Photo

🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀 🖼️Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks

🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀

🖼️Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks
Caiqi Zhang (@caiqizh) 's Twitter Profile Photo

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods. Not limited to short-form QA! ‼️Just output confidence in a single decoding pass. ✅Better calibration! 🚀 20× faster runtime. arXiv:2505.23912 👇

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation.

🤩No sampling. No slow post-hoc methods. Not limited to short-form QA!

‼️Just output confidence in a single decoding pass.

✅Better calibration!
🚀 20× faster runtime.

arXiv:2505.23912
👇
Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

🌟Exciting work in reasoning efficiency from MSRA/UCLA/CAS. TL;DR: a dynamic training method that compresses long CoT reasoning by 40% in response length, while maintaining or even improving accuracy. Just by simple SFT, we achieve more concise and efficient reasoning.#AI #LLMs

Jing Xiong (@_june1126) 's Twitter Profile Photo

🔬 The HKU team presents ParallelComp: a training-free technique for efficient context length extrapolation in LLMs—from 8K up to 128K tokens—on a single A100 GPU, with minimal performance loss. 📄 Paper: arxiv.org/abs/2502.14317 💻 Code: github.com/menik1126/Para…

🔬 The HKU team presents ParallelComp: a training-free technique for efficient context length extrapolation in LLMs—from 8K up to 128K tokens—on a single A100 GPU, with minimal performance loss.

📄 Paper: arxiv.org/abs/2502.14317
💻 Code: github.com/menik1126/Para…
Xiao Liang (@mastervito0601) 's Twitter Profile Photo

🙋‍♂️ Can RL training address model weaknesses without external distillation? 🚀 Please check our latest work on RL for LLM reasoning! 💯 TL;DR: We propose augmenting RL training with synthetic problems targeting model’s reasoning weaknesses. 📊Qwen2.5-32B: 42.9 → SwS-32B: 68.4

🙋‍♂️ Can RL training address model weaknesses without external distillation?

🚀 Please check our latest work on RL for LLM reasoning!

💯 TL;DR: We propose augmenting RL training with synthetic problems targeting model’s reasoning weaknesses.

📊Qwen2.5-32B: 42.9 → SwS-32B: 68.4
Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

Excited to co-organize the AI for Math Workshop at ICML 2025! We are actively seeking reviewers for the submitted papers, ranging from formal/informal math to scientific reasoning. Join us at: docs.google.com/forms/d/e/1FAI…