Xuehai He (@xuehaih) Twitter Tweets • TwiCopy

Kaiwen Zhou

7 months ago

🛡️ R1 Safety Paper Alert! 📰 How safe are large reasoning models like R1? What is their safety behavior? Does their enhanced capability introduce greater risks? — We present a comprehensive safety analysis on large reasoning models: 🔥 Key Findings: 1️⃣Open-source R1 models lag

thumb_up_off_alt102

chat_bubble_outline10

repeat23

shareShare

Qianqi "Jackie" Yan

@qianqi_yan

6 months ago

New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! ✨ Ever visited a webpage where the text says “IKEA desk” yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows “50% growth” in the text but the accompanying chart looks flat?

thumb_up_off_alt30

chat_bubble_outline1

repeat9

shareShare

Yiping Wang

@ypwang61

6 months ago

Our StoryEval benchmark is accepted by #CVPR2025!😃Thanks for all collaborators!

thumb_up_off_alt28

chat_bubble_outline0

repeat2

shareShare

Xin Eric Wang @ ICLR 2025

@xwang_lk

6 months ago

𝐁𝐞𝐚𝐭𝐢𝐧𝐠 𝐎𝐩𝐞𝐧𝐀𝐈 𝐢𝐬 𝐧𝐨𝐭 𝐚𝐬 𝐡𝐚𝐫𝐝 𝐚𝐬 𝐲𝐨𝐮 𝐭𝐡𝐢𝐧𝐤. If you don't believe you can compete, you've already lost. Winning starts with mindset. 🚀Introducing 𝑨𝒈𝒆𝒏𝒕 𝑺2, 𝐭𝐡𝐞 𝐰𝐨𝐫𝐥𝐝'𝐬 𝐛𝐞𝐬𝐭 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫-𝐮𝐬𝐞 𝐚𝐠𝐞𝐧𝐭, and the second

thumb_up_off_alt546

chat_bubble_outline30

repeat87

shareShare

Xin Eric Wang @ ICLR 2025

@xwang_lk

5 months ago

Since launching Agent S2, many folks working on GUI/computer-use agents asked for our tech report. Here we go! 🎉New SOTA on 3 major computer use benchmarks. • OSWorld (15 steps): 27.0% 🚀 (+18.9%) • OSWorld (50 steps): 34.5% 🚀 (+32.7%) • WindowsAgentArena: 29.8% 🚀

thumb_up_off_alt204

chat_bubble_outline7

repeat42

shareShare

Xuehai He

@xuehaih

4 months ago

Hello, every ICLR participant! I am going to present our MMWorld on 4/24 9:30 am - 11:30 am SST)✈️✈️. Welcome to our poster for a chat and discuss the next generation video understanding model!😎😎 Looking forward to meeting old and new friends there! #ICLR2025

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Simular

@simularai

4 months ago

Meet 𝗦𝗶𝗺𝘂𝗹𝗮𝗿 — the first AI agent that browses the Internet "with" you, right on your Mac. It acts locally on macOS and works alongside you. Take over anytime or team up with Simular in real time. See how Simular makes everyday digital life faster, easier, and smarter.

thumb_up_off_alt637

chat_bubble_outline81

repeat91

shareShare

Jianwei Yang

@jw2yang4ai

4 months ago

🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at #CVPR2025 2025! 🔗 computer-vision-in-the-wild.github.io/cvpr-2025/ ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. Boqing Gong, Prof. Cordelia Schmid, Prof. Ranjay Krishna, Prof. Saining Xie, Prof.

🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at <a href="/CVPR/">#CVPR2025</a> 2025!
🔗 computer-vision-in-the-wild.github.io/cvpr-2025/

⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. <a href="/BoqingGo/">Boqing Gong</a>, Prof. <a href="/CordeliaSchmid/">Cordelia Schmid</a>, Prof. <a href="/RanjayKrishna/">Ranjay Krishna</a>, Prof. <a href="/sainingxie/">Saining Xie</a>, Prof.

thumb_up_off_alt94

chat_bubble_outline1

repeat19

shareShare

Xuehai He

@xuehaih

4 months ago

Our MMWorld challenge will be hosted at the workshop at #CVPR2025 2025 this year. Welcome to evaluate your model and submit to it!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Yiping Wang

@ypwang61

4 months ago

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: arxiv.org/abs/2504.20571

thumb_up_off_alt413

chat_bubble_outline14

repeat84

shareShare

Saaket Agashe @ NAACL 2025

@saa1605

4 months ago

I’ll be presenting our poster: “LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models” tomorrow at #NAACL2025! ⏰ 11 AM 📍 Hall 3 Drop by to chat about applications of LLMs for Multi-Agent Coordination! #MultiAgentAI #LLMs

thumb_up_off_alt19

chat_bubble_outline0

repeat4

shareShare

Xin Eric Wang @ ICLR 2025

@xwang_lk

3 months ago

𝘏𝘶𝘮𝘢𝘯𝘴 𝘵𝘩𝘪𝘯𝘬 𝘧𝘭𝘶𝘪𝘥𝘭𝘺—𝘯𝘢𝘷𝘪𝘨𝘢𝘵𝘪𝘯𝘨 𝘢𝘣𝘴𝘵𝘳𝘢𝘤𝘵 𝘤𝘰𝘯𝘤𝘦𝘱𝘵𝘴 𝘦𝘧𝘧𝘰𝘳𝘵𝘭𝘦𝘴𝘴𝘭𝘺, 𝘧𝘳𝘦𝘦 𝘧𝘳𝘰𝘮 𝘳𝘪𝘨𝘪𝘥 𝘭𝘪𝘯𝘨𝘶𝘪𝘴𝘵𝘪𝘤 𝘣𝘰𝘶𝘯𝘥𝘢𝘳𝘪𝘦𝘴. But current reasoning models remain constrained by discrete tokens, limiting their full

thumb_up_off_alt931

chat_bubble_outline27

repeat136

shareShare

Yue Fan

@yfan_ucsc

3 months ago

Before o3 impressed everyone with 🔥visual reasoning🔥, we already had faith in and were exploring models that can think with images. 🚀 Here’s our shot, GRIT: Grounded Reasoning with Images & Texts that trains MLLMs to think while performing visual grounding. It is done via RL

thumb_up_off_alt165

chat_bubble_outline3

repeat36

shareShare

AK

@_akhaliq

3 months ago

GRIT Teaching MLLMs to Think with Images

thumb_up_off_alt302

chat_bubble_outline3

repeat48

shareShare

Kaiwen Zhou

@kaiwenzhou9

3 months ago

🛡️ Improved Safe for Large Reasoning Models! 🧠 How can we better align large reasoning models (LRMs) against unseen jailbreaks and harmful prompts? We present SafeKey — a LRM alignment method that helps activate the aha-moment of safety reasoning. 🔥 Key Points: 1️⃣ Aha-moment

thumb_up_off_alt36

chat_bubble_outline1

repeat12

shareShare

Yiping Wang

@ypwang61

3 months ago

I agree that having a consistent evaluation pipeline and better illustrating format and non-format gain are important, as we recently updated(x.com/ypwang61/statu……) But I disagree with some points in the blog for 1-shot RLVR. 1. For Deepseek-R1-Distill-Qwen-1.5B, we set

thumb_up_off_alt30

chat_bubble_outline1

repeat6

shareShare

Qianqi "Jackie" Yan

@qianqi_yan

3 months ago

🚀 New paper out! “Hidden in Plain Sight: Probing Implicit Reasoning in Multimodal Language Models” Real life is messy: 🔹 “Make a cup of apple juice” - but no apples are in sight 🔹 “Say hi to my friend” - yet two people are in the frame 🔹 “Tell me the brand of the lipstick” -

thumb_up_off_alt18

chat_bubble_outline1

repeat5

shareShare

Qianqi "Jackie" Yan

@qianqi_yan

3 months ago

Glad to share that this work has been accepted to Findings of ACL 2025!

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare