Jiarui Zhang (Jerry) (@jiaruiz58876329) Twitter Tweets • TwiCopy

Jiarui Zhang (Jerry)

@jiaruiz58876329

+ Follow

ID: 1559440854699237376

linkhttps://saccharomycetes.github.io/ calendar_today16-08-2022 07:24:14

47 Tweet

301 Followers

588 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

[1/11] Many recent studies have shown that current multimodal LLMs (MLLMs) struggle with low-level visual perception (LLVP) — the ability to precisely describe the fine-grained/geometric details of an image. How can we do better? Introducing Euclid, our first study at improving

thumb_up_off_alt21

chat_bubble_outline1

repeat5

shareShare

Shangshang Wang

@upupwang

5 months ago

🔍 Diving deep into LLM reasoning? From OpenAI's o-series to DeepSeek R1, from post-training to test-time compute — we break it down into structured spreadsheets. 🧵

thumb_up_off_alt19

chat_bubble_outline1

repeat4

shareShare

Jiarui Zhang (Jerry)

@jiaruiz58876329

5 months ago

Multimodal large language models (MLLMs) often struggle with small visual details, but do we need to retrain them to fix this? In our #ICLR'25 paper, we found that MLLMs already know where to look—even when their final answers are wrong! Inspired by this, we developed a method

thumb_up_off_alt174

chat_bubble_outline2

repeat36

shareShare

Jiarui Zhang (Jerry)

@jiaruiz58876329

5 months ago

Arxiv: arxiv.org/abs/2502.17422 Code: github.com/saccharomycete…

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Prateek Chhikara

@pckraftwerk

5 months ago

Can MLLMs perceive small details as well as large ones? In our recent #ICLR paper, we find their accuracy is size-sensitive, but they know where to look! We propose a training-free visual intervention to boost perception. Paper: arxiv.org/pdf/2502.17422

thumb_up_off_alt9

chat_bubble_outline2

repeat2

shareShare

Saket Aryan

@whysosaket

5 months ago

Been reading a lots of papers lately, but this one stood out for sure...... They used an interesting insight to improve MLLMs performance, which is that they know where to look in the image even if they fail to answer the visual question. Loved it ❤️

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Prateek Chhikara

@pckraftwerk

5 months ago

With #WACV2025 happening now, resharing our #WACV2024 paper: FIRE: Food Image to REcipe Generation! FIRE is an AI model that turns food photos into full recipes; including the title, ingredients, and cooking steps. We use BLIP, Vision Transformers, and T5 to make it happen.

thumb_up_off_alt2

chat_bubble_outline1

repeat2

shareShare

Jiarui Zhang (Jerry)

@jiaruiz58876329

3 months ago

So impressed by OpenAI o3😃 In our #ICLR2025 paper (arxiv.org/pdf/2502.17422), we explored a similar idea on open source MLLMs, where the visual focus (a crop) is implictly generated by their attention map. And it helps LLaVA-1.5 improve 20% on V* bench. Excited to see how

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

Jiarui Zhang (Jerry)

@jiaruiz58876329

3 months ago

Presenting two papers at #ICLR2025 in Singapore, let's chat and connect if you are interested in #Multimodal LLMs and their visual perception, reasoning abilities! 1. MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Sat 26 Apr 10

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Syeda Nahida Akter

@snat02792153

3 months ago

RL boosts LLM reasoning—but why stop at math & code? 🤔 Meet Nemotron-CrossThink—a method to scale RL-based self-learning across law, physics, social science & more. 🔥Resulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers!

thumb_up_off_alt427

chat_bubble_outline3

repeat51

shareShare

Prateek Chhikara

@pckraftwerk

a month ago

Last month, I spoke at the Tianqiao & Chrissy Chen Institute × AGI House Parametric Memory Workshop, where I introduced mem0 adaptive memory layer for AI agents. I presented real-world examples: - Personalized Learning: Tracking each student’s mastery to tailor lessons without repetition

Last month, I spoke at the <a href="/ChenInstitute/">Tianqiao & Chrissy Chen Institute</a> × <a href="/agihouse_org/">AGI House</a> Parametric Memory Workshop, where I introduced <a href="/mem0ai/">mem0</a> adaptive memory layer for AI agents. I presented real-world examples:

- Personalized Learning: Tracking each student’s mastery to tailor lessons without repetition

thumb_up_off_alt15

chat_bubble_outline2

repeat4

shareShare