Wenhu Chen (@wenhuchen) 's Twitter Profile
Wenhu Chen

@wenhuchen

AI researcher. Interested in Reasoning, Multimodal. I direct TIGER-Lab. Author of PoT, MMMU, MMLU-Pro, MAmmoTH, CFT, LongRAG, MAP-Neo, YuE, Mocha, SuTI

ID: 727242818452897796

linkhttps://wenhuchen.github.io/ calendar_today02-05-2016 21:06:14

2,2K Tweet

19,19K Followers

638 Following

Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

Happy to collaborate with Raghuveer, Bhuwan and others to work on batch mining for VLM2Vec. Now it's the SoTA on the MMEB benchmark. huggingface.co/spaces/TIGER-L…

Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

Our General Reasoner paper is coming out on Arxiv at arxiv.org/abs/2505.14652 We have re-trained our general-reasoner models to obtain much better performance! - Our 4B General Reasoner can even beat the NVDIA's Nemotron-CrossThink-7B significantly. - Our 14B General-Reasoner

Our General Reasoner paper is coming out on Arxiv at arxiv.org/abs/2505.14652
We have re-trained our general-reasoner models to obtain much better performance!

- Our 4B General Reasoner can even beat the NVDIA's Nemotron-CrossThink-7B significantly.
- Our 14B General-Reasoner
Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

Thanks for sharing! - Our 4B General Reasoner can even beat the NVDIA's Nemotron-CrossThink-7B significantly. - Our 14B General-Reasoner (Qwen3) can already achieve MMLU-Pro of 70.3%, GPQA of 56%, SuperGPQA of 39.9%, and TheoremQA of 54.4%. It's one of the most powerful

Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

Veo 3 blew people's mind in generating talking characters! It's so exciting! But we need evaluation benchmark for that! Cong just released the MochaBench used in our Mocha paper to evaluate talking character models. Github: github.com/congwei1230/Mo… Mocha Paper:

Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

Thanks for sharing our paper! We are able to incentivize VLMs to conduct reasoning in the pixel/image space in the o3-style. Paper: arxiv.org/abs/2505.15966