Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile
Martin Ziqiao Ma

@ziqiao_ma

〽️ PhD @UMichCSE | 💼 @IBM @Adobe @Amazon | @ACLMentorship | Weinberg Cogsci Fellow | Cogsci x Multimodality | 💬 Language Grounding & Alignment to 👥 & 👀.

ID: 1194045284621438980

linkhttp://ziqiaoma.com/ calendar_today12-11-2019 00:12:44

786 Tweet

2,2K Followers

1,1K Following

Hokin Deng (@denghokin) 's Twitter Profile Photo

#ICML #cognition #GrowAI We spent 2 years carefully curated every single experiment (i.e. object permanence, A-not-B task, visual cliff task) in this dataset (total: 1503 classic experiments spanning 12 core cognitive concepts). We spent another year to get 230 MLLMs evaluated

#ICML #cognition #GrowAI We spent 2 years carefully curated every single experiment (i.e. object permanence, A-not-B task, visual cliff task) in this dataset (total: 1503 classic experiments spanning 12 core cognitive concepts). 

We spent another year to get 230 MLLMs evaluated
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation "we introduce WM-ABench, a large-scale benchmark comprising 23 fine-grained evaluation dimensions across 6 diverse simulated environments with controlled counterfactual simulations. Through 660

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

"we introduce WM-ABench, a large-scale benchmark comprising 23  fine-grained evaluation dimensions across 6 diverse simulated  environments with controlled counterfactual simulations. Through 660
CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Zhiting Hu (@zhitinghu) 's Twitter Profile Photo

🚨Do frontier VLMs (o3, Gemini 2.5, Claude 3.5, Qwen…) actually learn an internal world model🌍? Surprisingly, the answer appears to be a hard NO—as revealed by our WM Atomic Benchmark⚛️. Even o3 struggles with the most basic, atomic-level questions: ❌Confuse triangles📐 with

Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

Excited to share WM-ABench, the first atomic and controlled benchmark of internal world models in VLMs, to appear in #ACL2025 Findings. I'm particularly proud of the cognitively-inspired conceptual framework that grounds our design. If you're curious about how we formalize

Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

Just 5 days ago, I asked Xiang whether I should try using a distilled math reasoning model as the base for a VLM I’m training. He said no. I asked why. He said, “Stay tuned.” And now… here I am, reading this paper with the rest of you.

Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

I know ACL and ICML are around the corner, but the only conference I’m planning to attend this month is #AX2025. But yeah, I did launch a job at the expo. 🤪

I know ACL and ICML are around the corner, but the only conference I’m planning to attend this month is #AX2025. 

But yeah, I did launch a job at the expo. 🤪
Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

Our study on pragmatic generation is accepted to #COLM2025! Missed the first COLM last year (no suitable ongoing project at the time😅). Heard it’s a great place to connect with LM folks, excited to join for round two finally.

Eric Xing (@ericxing) 's Twitter Profile Photo

I have been long arguing that a world model is NOT about generating videos, but IS about simulating all possibilities of the world to serve as a sandbox for general-purpose reasoning via thought-experiments. This paper proposes an architecture toward that arxiv.org/abs/2507.05169

Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data

Zhengzhong Tu (@_vztu) 's Twitter Profile Photo

🤨Ever dream of a tool that can magically restore and upscale any (low-res) photo to crystal-clear 4K? 🔥Introducing "4KAgent: Agentic Any Image to 4K Super-Resolution", the most capable upscaling generalist designed to handle broad image types. 🔗4kagent.github.io 1/🧵

🤨Ever dream of a tool that can magically restore and upscale any (low-res) photo to crystal-clear 4K? 

🔥Introducing "4KAgent: Agentic Any Image to 4K Super-Resolution",  the most capable upscaling generalist designed to handle broad image types.
🔗4kagent.github.io
1/🧵
Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

📣 Excited to announce SpaVLE: #NeurIPS2025 Workshop on Space in Vision, Language, and Embodied AI! 👉 …vision-language-embodied-ai.github.io 🦾Co-organized with an incredible team → Freda Shi · Jiayuan Mao · Jiafei Duan · Manling Li · David Hsu · Parisa Kordjamshidi 🌌 Why Space & SpaVLE? We

📣 Excited to announce SpaVLE: #NeurIPS2025 Workshop on Space in Vision, Language, and Embodied AI! 

👉 …vision-language-embodied-ai.github.io

🦾Co-organized with an incredible team → <a href="/fredahshi/">Freda Shi</a> · <a href="/maojiayuan/">Jiayuan Mao</a> · <a href="/DJiafei/">Jiafei Duan</a> · <a href="/ManlingLi_/">Manling Li</a> · David Hsu · <a href="/Kordjamshidi/">Parisa Kordjamshidi</a> 

🌌 Why Space &amp; SpaVLE?
We
Jiafei Duan (@djiafei) 's Twitter Profile Photo

📣 Excited to announce SpaVLE: #NeurIPS2025 Workshop on Space in Vision, Language, and Embodied AI! 👉…vision-language-embodied-ai.github.io

📣 Excited to announce SpaVLE: #NeurIPS2025 Workshop on Space in Vision, Language, and Embodied AI!

👉…vision-language-embodied-ai.github.io
Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

+1 on this! Mixed-effects models are such an underrated protocol for behavioral analysis that AI researchers often overlook. Behavioral data are almost never independent: clustering, repeated measures, and hierarchical structures abound. Mixed-effects models account for these

Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

Wow lots of discussions about my paper...sorry I am a bit late to the party.😄 You raise an important point about prompt sensitivity. But it’s crucial to recognize the asymmetry in the logical implications of positive vs. negative evidence: -> To demonstrate that a system P