Jiarui Zhang (Jerry) (@jiaruiz58876329) 's Twitter Profile
Jiarui Zhang (Jerry)

@jiaruiz58876329

@USC CS Ph.D. student @CSatUSC | ex-intern @amazon | B.Eng. @Tsinghua_Uni | MLLM | Visual Perception | Reasoning | AI for Science

ID: 1559440854699237376

linkhttps://saccharomycetes.github.io/ calendar_today16-08-2022 07:24:14

47 Tweet

301 Followers

588 Following

Jiarui Zhang (Jerry) (@jiaruiz58876329) 's Twitter Profile Photo

[1/11] Many recent studies have shown that current multimodal LLMs (MLLMs) struggle with low-level visual perception (LLVP) — the ability to precisely describe the fine-grained/geometric details of an image. How can we do better? Introducing Euclid, our first study at improving

[1/11] Many recent studies have shown that current multimodal LLMs (MLLMs) struggle with low-level visual perception (LLVP) — the ability to precisely describe the fine-grained/geometric details of an image.

How can we do better?

Introducing Euclid, our first study at improving
Shangshang Wang (@upupwang) 's Twitter Profile Photo

šŸ” Diving deep into LLM reasoning? From OpenAI's o-series to DeepSeek R1, from post-training to test-time compute — we break it down into structured spreadsheets. 🧵

šŸ” Diving deep into LLM reasoning?

From OpenAI's o-series to DeepSeek R1, from post-training to test-time compute — we break it down into structured spreadsheets. 🧵
Jiarui Zhang (Jerry) (@jiaruiz58876329) 's Twitter Profile Photo

Multimodal large language models (MLLMs) often struggle with small visual details, but do we need to retrain them to fix this? In our #ICLR'25 paper, we found that MLLMs already know where to look—even when their final answers are wrong! Inspired by this, we developed a method

Multimodal large language models (MLLMs) often struggle with small visual details, but do we need to retrain them to fix this?

In our #ICLR'25 paper, we found that MLLMs already know where to look—even when their final answers are wrong!

Inspired by this, we developed a method
Prateek Chhikara (@pckraftwerk) 's Twitter Profile Photo

Can MLLMs perceive small details as well as large ones? In our recent #ICLR paper, we find their accuracy is size-sensitive, but they know where to look! We propose a training-free visual intervention to boost perception. Paper: arxiv.org/pdf/2502.17422

Saket Aryan (@whysosaket) 's Twitter Profile Photo

Been reading a lots of papers lately, but this one stood out for sure...... They used an interesting insight to improve MLLMs performance, which is that they know where to look in the image even if they fail to answer the visual question. Loved it ā¤ļø

Prateek Chhikara (@pckraftwerk) 's Twitter Profile Photo

With #WACV2025 happening now, resharing our #WACV2024 paper: FIRE: Food Image to REcipe Generation! FIRE is an AI model that turns food photos into full recipes; including the title, ingredients, and cooking steps. We use BLIP, Vision Transformers, and T5 to make it happen.

With #WACV2025 happening now, resharing our #WACV2024 paper: FIRE: Food Image to REcipe Generation! 

FIRE is an AI model that turns food photos into full recipes; including the title, ingredients, and cooking steps. We use BLIP, Vision Transformers, and T5 to make it happen.
Jiarui Zhang (Jerry) (@jiaruiz58876329) 's Twitter Profile Photo

So impressed by OpenAI o3😃 In our #ICLR2025 paper (arxiv.org/pdf/2502.17422), we explored a similar idea on open source MLLMs, where the visual focus (a crop) is implictly generated by their attention map. And it helps LLaVA-1.5 improve 20% on V* bench. Excited to see how

So impressed by OpenAI o3😃

In our #ICLR2025 paper (arxiv.org/pdf/2502.17422), we explored a similar idea on open source MLLMs, where the visual focus (a crop) is implictly generated by their attention map. And it helps LLaVA-1.5 improve 20% on V* bench.

Excited to see how
Jiarui Zhang (Jerry) (@jiaruiz58876329) 's Twitter Profile Photo

Presenting two papers at #ICLR2025 in Singapore, let's chat and connect if you are interested in #Multimodal LLMs and their visual perception, reasoning abilities! 1. MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Sat 26 Apr 10

Presenting two papers at #ICLR2025 in Singapore, 
let's chat and connect if you are interested in #Multimodal LLMs and their visual perception, reasoning abilities!

1. MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Sat 26 Apr 10
Syeda Nahida Akter (@snat02792153) 's Twitter Profile Photo

RL boosts LLM reasoning—but why stop at math & code? šŸ¤” Meet Nemotron-CrossThink—a method to scale RL-based self-learning across law, physics, social science & more. šŸ”„Resulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers!

RL boosts LLM reasoning—but why stop at math & code? šŸ¤”
Meet Nemotron-CrossThink—a method to scale RL-based self-learning across law, physics, social science & more.

šŸ”„Resulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers!
Prateek Chhikara (@pckraftwerk) 's Twitter Profile Photo

Last month, I spoke at the Tianqiao & Chrissy Chen Institute Ɨ AGI House Parametric Memory Workshop, where I introduced mem0 adaptive memory layer for AI agents. I presented real-world examples: - Personalized Learning: Tracking each student’s mastery to tailor lessons without repetition

Last month, I spoke at the <a href="/ChenInstitute/">Tianqiao & Chrissy Chen Institute</a>  Ɨ <a href="/agihouse_org/">AGI House</a>  Parametric Memory Workshop, where I introduced <a href="/mem0ai/">mem0</a>  adaptive memory layer for AI agents. I presented real-world examples:

- Personalized Learning: Tracking each student’s mastery to tailor lessons without repetition