Xingyu Fu (@xingyufu2) Twitter Tweets • TwiCopy

Fei Wang

9 months ago

𝗠𝘂𝗶𝗿𝗕𝗲𝗻𝗰𝗵 is officially accepted at #ICLR2025! 🎉 Recent VLMs/MLLMs such as LLaVA-OneVision, MM1.5, and MAmmoTH-VL have demonstrated significant progress on MuirBench.🚀 Excited to see how MuirBench continues to drive the innovation of VLMs! #AI #MachineLearning #VLM

thumb_up_off_alt37

chat_bubble_outline1

repeat8

shareShare

Sheng Zhang

@sheng_zh

9 months ago

Muirbench has been accepted to #ICLR2025! 🚀 Companies like Apple, TikTok, and Salesforce are already evaluating their LMMs on its multi-image setup—a robust testbed for multimodal reasoning. GenAI needs more benchmarks like this.🤯 Kudos to Fei Wang, Xingyu Fu ✈️ ICML25, and team! 👏

thumb_up_off_alt26

chat_bubble_outline1

repeat8

shareShare

Xiaodong Yu

@xiaodong_yu_126

8 months ago

Check our new paper on long context understanding! We use AgenticLU to significantly improve base model’s long contex performance (+14.7% avg on several datasets) without any scaling in the real inference time!

thumb_up_off_alt9

chat_bubble_outline0

repeat4

shareShare

Yushi Hu

@huyushi98

6 months ago

Excited to see the image reasoning in o3 and o4-mini!!🤩 We introduced this idea a year ago in visual Sketchpad (visualsketchpad.github.io). Excited to see OpenAI baking this into their model through agentic RL. Great work! And yes, reasoning should be multimodal! Huge shoutout

thumb_up_off_alt68

chat_bubble_outline2

repeat8

shareShare

Weijia Shi

@weijiashi2

6 months ago

Our previous work showed that 𝐜𝐫𝐞𝐚𝐭𝐢𝐧𝐠 𝐯𝐢𝐬𝐮𝐚𝐥 𝐜𝐡𝐚𝐢𝐧‑𝐨𝐟‑𝐭𝐡𝐨𝐮𝐠𝐡𝐭𝐬 𝐯𝐢𝐚 𝐭𝐨𝐨𝐥 𝐮𝐬𝐞 significantly boosts GPT‑4o’s visual reasoning performance. Excited to see this idea incorporated into OpenAI’s o3 and o4‑mini models (openai.com/index/thinking…).

thumb_up_off_alt258

chat_bubble_outline3

repeat40

shareShare

Yu Feng

@anniefeng6

6 months ago

#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD

thumb_up_off_alt256

chat_bubble_outline2

repeat38

shareShare

Sayak Paul

@risingsayak

6 months ago

Embedding a scientific basis in pre-trained T2I models can enhance the realism and consistency of the results. Cool work in "Science-T2I: Addressing Scientific Illusions in Image Synthesis" jialuo-li.github.io/Science-T2I-We…

thumb_up_off_alt103

chat_bubble_outline1

repeat16

shareShare

Jialuo Li

@jialuoli1007

6 months ago

🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: arxiv.org/abs/2504.13129 🌐 Project: jialuo-li.github.io/Science-T2I-Web 💻 Code: github.com/Jialuo-Li/Scie… 🤗 Dataset: huggingface.co/collections/Ji… 🔍

thumb_up_off_alt139

chat_bubble_outline4

repeat31

shareShare

Lucas Beyer (bl16)

@giffmana

6 months ago

This paper is interestingly thought- provoking for me. There is a chance, that it's easier to "align t2i model with real physics" in post-training. And let it learn to generate whatever (physically implausible) combinations in pretrain. As opposed to trying hard to come up with

thumb_up_off_alt212

chat_bubble_outline8

repeat17

shareShare

Fei Wang

@fwang_nlp

6 months ago

🎉 Excited to share that our paper, "MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding", will be presented at #ICLR2025! 📅 Date: April 24 🕒 Time: 3:00 PM 📍 Location: Hall 3 + Hall 2B #11 MuirBench challenges multimodal LLMs with diverse multi-image

thumb_up_off_alt53

chat_bubble_outline0

repeat17

shareShare

Xingyu Fu

@xingyufu2

6 months ago

Refocus🔍 Visual reasoning for Tables and Charts with Edits Happy to share ReFocus accepted at #ICML2025. We’ve open-sourced code and training data: zeyofu.github.io/ReFocus/ ReFocus enables multimodal LMs to better reason on Tables and Charts with visual edits. It also provides

thumb_up_off_alt15

chat_bubble_outline0

repeat6

shareShare

Xingyu Fu

@xingyufu2

5 months ago

😌Been wanting to post since March but waited for the graduation photo….Thrilled to finally share that I’ll be joining Princeton University as a postdoc Princeton PLI this August! Endless thanks to my incredible advisors and mentors from Penn, UW, Cornell, NYU, UCSB, USC,

thumb_up_off_alt466

chat_bubble_outline25

repeat5

shareShare

Mingyuan Wu

@mingyuanwu4

4 months ago

Research with amazing collaborators Jize Jiang, Meitang Li, and Jingcheng Yang, guided by great advisors and supported by the generous help of talented researchers Bowen Jin, Xingyu Fu ✈️ ICML25, and many open-source contributors (easyr1, verl, vllm... etc).

thumb_up_off_alt29

chat_bubble_outline0

repeat14

shareShare

Xiang Yue@ICLR2025🇸🇬

@xiangyue96

4 months ago

People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true? In our study (arxiv.org/pdf/2507.00432), we

thumb_up_off_alt604

chat_bubble_outline14

repeat124

shareShare

Weijia Shi

@weijiashi2

4 months ago

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data

thumb_up_off_alt197

chat_bubble_outline7

repeat59

shareShare

Xingyu Fu

@xingyufu2

4 months ago

I will be in #ICML2025 next week and present #ReFocus on Tuesday afternoon. 📍 West Exhibition Hall B2-B3 #W-202 ⏱️ Tue 15 Jul 4:30 p.m. PDT - 7 p.m. PDT Happy to chat and connect! Feel free to DM 😁 ReFocus link: huggingface.co/datasets/ReFoc…

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Yong Lin

@yong18850571

4 months ago

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

thumb_up_off_alt224

chat_bubble_outline6

repeat77

shareShare