Zhen Wang (@zhenwang9102) 's Twitter Profile
Zhen Wang

@zhenwang9102

postdoc @UCSanDiego🌴 | llm-reasoners; decentralized arena; PromptAgent; toolkengpt; world model + language model; reasoning; ai4science

ID: 4919390599

linkhttps://zhenwang9102.github.io/ calendar_today17-02-2016 04:29:02

232 Tweet

652 Followers

408 Following

Maitrix.org (@maitrixorg) 's Twitter Profile Photo

🤖Thrilled to introduce _ReasonerAgent_ - A fully open source, ready-to-run agent that does research🧐 in a web browser and answers your queries Use ReasonerAgent to help you: ✈️search for flights, 🛍️compile shopping options, 🗞️research news coverage, etc. 📘Check out more

Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

⚖️Towards rigorously benchmarking the progress of agents📏: Wondering whether frontier web agents are genuinely as good as reported? 🤔Are they truly reaching nearly 90% task success rates on real-world tasks and websites? Check out our more comprehensive and rigorous

⚖️Towards rigorously benchmarking the progress of agents📏: 

Wondering whether frontier web agents are genuinely as good as reported? 🤔Are they truly reaching nearly 90% task success rates on real-world tasks and websites? Check out our more comprehensive and rigorous
Boshi Wang (@boshiwang2) 's Twitter Profile Photo

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why? Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why?

Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design
Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

It's a great honor to give a keynote at the Molecule Maker Lab Institute symposium at UIUC! Many thanks to Prof. Heng Ji and Prof. Jiawei Han for invitation. The symposium’s theme this year is “AI scientist? What would it take?”, which I hold close to heart and made a talk titled “Language

Maitrix.org (@maitrixorg) 's Twitter Profile Photo

Voila! 🤗🔥 Super excited to open-source Voila -- new unified Voice-Language Foundation Models for real-time conversations, audio-in audio-out. Voila enables to build a voice-based character-ai👩‍👩‍👧‍👧 instantly, with over one million voice persona! Voila unified model supports:

Zhiting Hu (@zhitinghu) 's Twitter Profile Photo

I was kidding -- this video was entirely simulated by the _world model_ we're building. 😀 It's mind-blowing how it produces high-fidelity simulations, lasting several minutes, to complete non-trivial tasks. This showcases the potential for infinite data & experience in

Zhiting Hu (@zhitinghu) 's Twitter Profile Photo

A humanoid robot dancing with agility and flair💃 ... in a world _interactively_ simulated by world model Here’s the choreography we told the model to simulate, step by step: 💃Wave both arms and start jumping 👋 💃Dance dance dance‼️ 💃Stand still and put left arm

Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

Super excited to get funded by Schmidt Sciences to study computer-use agents (CUAs) under adversarial attacks. Many thanks to the student leads including Zeyi Liao, Jaylen Jones, Linxi Jiang, and amazing co-PIs Yu Su and Zhiqiang Lin. As the capabilities of CUAs improve,

Tianmin Shu (@tianminshu) 's Twitter Profile Photo

🚀 Excited to introduce SimWorld: an embodied simulator for infinite photorealistic world generation 🏙️ populated with diverse agents 🤖 If you are at #CVPR2025, come check out the live demo 👇 Jun 14, 12:00-1:00 pm at JHU booth, ExHall B Jun 15, 10:30 am-12:30 pm, #7, ExHall B

Zhiting Hu (@zhitinghu) 's Twitter Profile Photo

🔥Reinforcement learning for LLM reasoning is emerging—but many questions remain🧐🧐 ❓ Does RL teach new reasoning, or just elicit what’s already in the base LLM? ❓ Do long chains of thought truly emerge from RL? ❓ Most RL work has been focusing on math and coding. But how do

Qiyue Gao (@qiyuegao123) 's Twitter Profile Photo

🤔 Have OpenAI o3, Gemini 2.5, Claude 3.7 formed an internal world model to understand the physical world, or just align pixels with words? We introduce WM-ABench, the first systematic evaluation of VLMs as world models. Using a cognitively-inspired framework, we test 15 SOTA

Zhiting Hu (@zhitinghu) 's Twitter Profile Photo

🚨Do frontier VLMs (o3, Gemini 2.5, Claude 3.5, Qwen…) actually learn an internal world model🌍? Surprisingly, the answer appears to be a hard NO—as revealed by our WM Atomic Benchmark⚛️. Even o3 struggles with the most basic, atomic-level questions: ❌Confuse triangles📐 with

Dynamics Lab (@dynamicslab_ai) 's Twitter Profile Photo

💥💥BANG! Experience the future of gaming with our real-time world model for video games!🕹️🕹️ Not just PLAY—but CREATE! Introducing Mirage, the world’s first AI-native UGC game engine. Now featuring real-time playable demos of two games: 🏙️ GTA-style urban chaos 🏎️ Forza

Eric Xing (@ericxing) 's Twitter Profile Photo

I have been long arguing that a world model is NOT about generating videos, but IS about simulating all possibilities of the world to serve as a sandbox for general-purpose reasoning via thought-experiments. This paper proposes an architecture toward that arxiv.org/abs/2507.05169

LAW Workshop@NeurIPS 2025 (@law2025_neurips) 's Twitter Profile Photo

📢 Thrilled to announce LAW 2025 workshop, Bridging Language, Agent, and World Models, at #NeurIPS2025 this December in San Diego! 🌴🏖️ 🎉 Join us in exploring the exciting intersection of #LLMs, #Agents, #WorldModels! 🧠🤖🌍 🔗 sites.google.com/view/law-2025 #ML #AI #GenerativeAI 1/

📢 Thrilled to announce LAW 2025 workshop, Bridging Language, Agent, and World Models, at #NeurIPS2025 this December in San Diego! 🌴🏖️

🎉 Join us in exploring the exciting intersection of #LLMs, #Agents, #WorldModels! 🧠🤖🌍

🔗 sites.google.com/view/law-2025
 #ML #AI #GenerativeAI
1/
Zhen Wang (@zhenwang9102) 's Twitter Profile Photo

Huge thanks to Lambda for sponsoring awards for ALL accepted papers at #LAW2025! #NeurIPS2025 The deadline is approaching fast. Let's build a great program together🤗 ✍️Submit your work here: openreview.net/group?id=NeurI… 🧐Join our program committee: docs.google.com/forms/d/e/1FAI…👇