Zhijiang Guo (@zhijiangg) Twitter Tweets • TwiCopy

Han Wu

5 months ago

💡Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging We comprehensively study existing model merging methods on efficient Long-to-Short LLM reasoning tasks, and find their huge potential in the field.

thumb_up_off_alt17

chat_bubble_outline1

repeat11

shareShare

Zhijiang Guo

@zhijiangg

5 months ago

😂 You're right! Claude's just too academically honest. 😉 Glad 4o unlocked your inner robot-sidekick-having private eye!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

BangLiu

@bangl93

5 months ago

🧠264 pages and 1416 references chart the future of Foundation Agents. Our latest survey dives deep into agents—covering brain-inspired cognition, self-evolution, multi-agents, and AI safety. Discover the #1 Paper of the Day on Hugging Face👇: huggingface.co/papers/2504.01… 1/3

thumb_up_off_alt259

chat_bubble_outline15

repeat89

shareShare

Zhijiang Guo

@zhijiangg

5 months ago

🚀Excited to announce the AI for Math Workshop at ICML 2025 with amazing co-organizers! 🌐This event is a fantastic opportunity to explore the intersection of AI and Math. 🔍Join us to learn from leading experts, share your research, and connect with like-minded researchers.

thumb_up_off_alt21

chat_bubble_outline0

repeat3

shareShare

Xiao Zhu

@shawnxzhu

5 months ago

2/4 🎯 We identify a new type of bias in reward models: model preference bias. Popular RMs often over-value certain LLMs—even when those models rank lower in human-voted Elo scores. ⚠️ Example: Gemma-2-9B-it-SimPO often gets much higher RM scores than GPT-4o, even though GPT-4o

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Simon Yu

@simon_ycl

5 months ago

Our recently released TextArena framework is perfect as a multi-turn reasoning benchmark and training environment for your models. 🏆 Leaderboard: Models vs Humanity 🌍 Env: Gym-like RL environment for multi-turn reasoning ▶️ Online Playing: You can play against any reasoning

thumb_up_off_alt23

chat_bubble_outline1

repeat4

shareShare

Zhijiang Guo

@zhijiangg

4 months ago

Feel free to drop by our poster session to discuss autoformalization,formal math,and math reasoning with LLMs if you’re interested! 📈💡#ICLR2025

thumb_up_off_alt27

chat_bubble_outline0

repeat3

shareShare

Zhijiang Guo

@zhijiangg

4 months ago

Excited to share our second work on LLMs for math/reasoning at #ICLR2025. Introducing OptiBench,a comprehensive benchmark for evaluating LLMs in optimization,and ReSocratic,a reverse data synthesis method to enhance LLM reasoning.Come chat with us at our poster session!🚀

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Zhijiang Guo

@zhijiangg

4 months ago

Tired of the computational cost of traditional #LLM ensembling? 🤔 Our #ICLR2025 spotlight paper presents UniTE!🎉 By uniting top-k tokens, we achieve strong results with reduced overhead. Find us at the poster session if you are interested.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Zeyuan Allen-Zhu, Sc.D.

@zeyuanallenzhu

4 months ago

(1/8)🍎A Galileo moment for LLM design🍎 As Pisa Tower experiment sparked modern physics, our controlled synthetic pretraining playground reveals LLM architectures' true limits. A turning point that might divide LLM research into "before" and "after." physics.allen-zhu.com/part-4-archite…

thumb_up_off_alt934

chat_bubble_outline22

repeat150

shareShare

CambridgeNLP

@cambridgenlp

4 months ago

Andreas Vlachos Zifeng Ding Rami Aly Rui Cao Yulong Chen Zhenyun Deng are organising the 8th FEVERworkshop with Mubashara Akhtar Christos Christodoulopoulos Oana Cocarascu Zhijiang Guo Arpit Mittal Michael Schlichtkrull James Thorne Chenxi. Submit 19 May/Commit 9 June. fever.ai/workshop.html

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Yi Xu

@_yixu

4 months ago

🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat207

shareShare

Yinya Huang ✈️ ICLR

@yinyahuang

3 months ago

🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀 🖼️Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks

thumb_up_off_alt36

chat_bubble_outline4

repeat16

shareShare

Caiqi Zhang

@caiqizh

3 months ago

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods. Not limited to short-form QA! ‼️Just output confidence in a single decoding pass. ✅Better calibration! 🚀 20× faster runtime. arXiv:2505.23912 👇

thumb_up_off_alt39

chat_bubble_outline2

repeat22

shareShare

Zhijiang Guo

@zhijiangg

3 months ago

🌟Exciting work in reasoning efficiency from MSRA/UCLA/CAS. TL;DR: a dynamic training method that compresses long CoT reasoning by 40% in response length, while maintaining or even improving accuracy. Just by simple SFT, we achieve more concise and efficient reasoning.#AI #LLMs

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Jing Xiong

@_june1126

3 months ago

🔬 The HKU team presents ParallelComp: a training-free technique for efficient context length extrapolation in LLMs—from 8K up to 128K tokens—on a single A100 GPU, with minimal performance loss. 📄 Paper: arxiv.org/abs/2502.14317 💻 Code: github.com/menik1126/Para…

thumb_up_off_alt14

chat_bubble_outline5

repeat9

shareShare

Xiao Liang

@mastervito0601

3 months ago

🙋‍♂️ Can RL training address model weaknesses without external distillation? 🚀 Please check our latest work on RL for LLM reasoning! 💯 TL;DR: We propose augmenting RL training with synthetic problems targeting model’s reasoning weaknesses. 📊Qwen2.5-32B: 42.9 → SwS-32B: 68.4

thumb_up_off_alt134

chat_bubble_outline7

repeat38

shareShare

Zhijiang Guo

@zhijiangg

2 months ago

Excited to co-organize the AI for Math Workshop at ICML 2025! We are actively seeking reviewers for the submitted papers, ranging from formal/informal math to scientific reasoning. Join us at: docs.google.com/forms/d/e/1FAI…

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare