Yubo Wang (@yubowang726) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Yubo Wang

@yubowang726

a year ago

🚀 THUDM just released GLM 4. Check out its impressive scores on the MMLU-Pro benchmark: (For more detailed results, visit huggingface.co/spaces/TIGER-L…)

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

🎉 Our paper MMLU-Pro has been selected for a spotlight at the 2024 NeurIPS D&B track! Huge thanks to all co-authors at Tiger-AI-Lab for their support and guidance! 🙏 We hope it can help in the evaluation of LLMs! #NeurIPS2024 #MMLUPro #TigerAILab

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Aran Komatsuzaki

@arankomatsuzaki

10 months ago

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks 505 realistic tasks encompassing over 8,000 samples from 16 expert annotators to extensively cover the multimodal task space proj: tiger-ai-lab.github.io/MEGA-Bench/ abs: arxiv.org/abs/2410.10563

thumb_up_off_alt191

chat_bubble_outline5

repeat40

shareShare

Yubo Wang

@yubowang726

6 months ago

Excited to share our work on Critique Fine-Tuning (CFT), a new paradigm that teaches language models through critique rather than imitation. With just 50K examples and 8 GPU hours of training, we achieve comparable or better performance than traditional SFT and RL approaches.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Yubo Wang

@yubowang726

6 months ago

SuperGPQA breaks new ground: 285 disciplines, one massive AI test! Moving beyond basic math & coding to challenge LLMs in agriculture, industry & more. Top AI only hits 61.82% - an incredible milestone in mapping AI capabilities!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wenhu Chen

@wenhuchen

4 months ago

🎬 Automated filmmaking is the future — You need dialogue, expressive talking heads, synchronized body motion, and multi-character interactions. 🚀 Today, in collaboration with AI at Meta, we’re excited to introduce MoCha: Towards Movie-Grade Talking Character Synthesis 🔊

thumb_up_off_alt89

chat_bubble_outline3

repeat20

shareShare

Wenhu Chen

@wenhuchen

4 months ago

🚀 Introducing ScholarCopilot: a next-gen AI assistant designed specifically for professional academic writing! We have done more in-depth evaluation and human study to show that it outperforms ChatGPT significantly in terms of citation accuracy. Our paper is online now:

thumb_up_off_alt132

chat_bubble_outline5

repeat18

shareShare

Wenhu Chen

@wenhuchen

4 months ago

🔥 How do you build a state-of-the-art Vision-Language Model with direct RL? We’re excited to introduce VL-Rethinker, a new paradigm for multimodal reasoning trained directly with Reinforcement Learning. 📈 It sets new SOTA on key math+vision benchmarks: - MathVista: 80.3 → 🥇

thumb_up_off_alt289

chat_bubble_outline9

repeat61

shareShare

Wenhu Chen

@wenhuchen

4 months ago

🚀 General-Reasoner: Generalizing LLM Reasoning Across All Domains (Beyond Math) Most recent RL/R1 works focus on math reasoning—but math-only tuning doesn't generalize to general reasoning (e.g. drop on MMLU-Pro and SuperGPQA). Why are we limited to math reasoning? 1. Existing

thumb_up_off_alt332

chat_bubble_outline8

repeat76

shareShare

Ge Zhang

@gezhang86038849

3 months ago

[1/5] 💥 Facing the LLM Scaling Challenge Head-On! 💥 Glad to introduce MGA: Reformulation for Pretraining Data Augmentation! The AI world is grappling with data limitations and the performance hit from data repetition. We introduce MGA (Massive Genre-Audience)

thumb_up_off_alt74

chat_bubble_outline2

repeat20

shareShare

Ge Zhang

@gezhang86038849

3 months ago

[1/n] 🚨 Game On for LLM Reasoning—Meet KORGym! 🎮✨ Ever wondered how to truly assess an LLM’s reasoning ability beyond memorized knowledge? Meet our latest breakthrough: KORGym—a dynamic, multi-turn game platform built to reveal the real reasoning skills of language models!

thumb_up_off_alt33

chat_bubble_outline1

repeat8

shareShare

Wenhu Chen

@wenhuchen

3 months ago

Our General Reasoner paper is coming out on Arxiv at arxiv.org/abs/2505.14652 We have re-trained our general-reasoner models to obtain much better performance! - Our 4B General Reasoner can even beat the NVDIA's Nemotron-CrossThink-7B significantly. - Our 14B General-Reasoner

thumb_up_off_alt136

chat_bubble_outline3

repeat15

shareShare