Le Xue (@le_xue01) Twitter Tweets • TwiCopy

Le Xue

a year ago

🍃MINT-1T (arxiv.org/abs/2406.11271), the largest open-source interleaved multimodal dataset, will be presented in NeurIPS today 11 a.m. — 2 p.m. PST at East Exhibit Hall A-C #3604. Drop by and discuss🙌 Affiliations: University of Washington, Salesforce AI Research

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Salesforce AI Research

@sfresearch

10 months ago

🔬🔬🔬Introducing ProVision: A new system for transforming images into verified instruction data for multimodal language models (MLMs) at massive scale! Scene graphs + programmatic synthesis generate 10M+ diverse, automated Q&A pairs. Fully verifiable. Training MLMs? Dive in:

thumb_up_off_alt113

chat_bubble_outline1

repeat33

shareShare

Jieyu Zhang

@jieyuzhang20

10 months ago

Excited to share my intern project at Salesforce Research! Huge thanks to everyone on the team!!

thumb_up_off_alt81

chat_bubble_outline0

repeat15

shareShare

VentureBeat

@venturebeat

10 months ago

Breaking the data bottleneck: Salesforce's ProVision speeds multimodal AI training with image scene graphs venturebeat.com/data-infrastru…

thumb_up_off_alt12

chat_bubble_outline1

repeat6

shareShare

Salesforce AI Research

@sfresearch

9 months ago

We're so excited to see VentureBeat and @MarkTechPost cover ProVision! We're tackling the visual instruction data challenge with scene graphs + human-written programs, already seeing 3-8% improvements across benchmarks. The real win? A more #OpenSourced, reproducible approach

thumb_up_off_alt7

chat_bubble_outline0

repeat4

shareShare

Juan Carlos Niebles

@jcniebles

9 months ago

More on ProVision: Our instruction data generation system for multimodal language models. 📄arxiv: arxiv.org/abs/2412.07012 💻 GitHub: github.com/JieyuZ2/ProVis… See thread for more discussion ⬇️🧵

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Silvio Savarese

@silviocinguetta

9 months ago

Honored to see VentureBeat highlight Salesforce AI Research's work on ProVision. Open sourcing this framework helps democratize #MultimodalAI development. Proud of the team's innovative approach using scene graphs to generate high-quality visual instruction data at scale. 📷 #AIResearch

thumb_up_off_alt12

chat_bubble_outline0

repeat4

shareShare

Salesforce AI Research

@sfresearch

9 months ago

🚨🎥🚨🎥🚨 xGen-MM-Vid (BLIP-3-Video) is now available on Hugging Face! Our compact VLM achieves SOTA performance with just 32 tokens for video understanding. Features explicit temporal encoder + BLIP-3 architecture. Try it out! 🤗32 Token Model: bit.ly/3PBNBBz 🤗128

🚨🎥🚨🎥🚨 xGen-MM-Vid (BLIP-3-Video) is now available on <a href="/huggingface/">Hugging Face</a>!

Our compact VLM achieves SOTA performance with just 32 tokens for video understanding. Features explicit temporal encoder + BLIP-3 architecture. Try it out!

🤗32 Token Model: bit.ly/3PBNBBz
🤗128

thumb_up_off_alt9

chat_bubble_outline1

repeat5

shareShare

Juan Carlos Niebles

@jcniebles

9 months ago

Thrilled to share our compact but powerful video-language models on Hugging Face ! arxiv: arxiv.org/abs/2410.16267 Great work from the video team at Salesforce AI Research 🚀🚀

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Honglu Zhou

@zhou_honglu

8 months ago

Can AI models reason now?? Let's find out!!! Only 2 days left to submit your work to the 4th **Multimodal Algorithmic Reasoning Workshop** at #CVPR 2025. We welcome both published & unpublished papers. Submit now: marworkshop.github.io/cvpr25/

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Le Xue

@le_xue01

8 months ago

Multimodal reasoning is a promising research frontier, especially after DeepSeek R1. Yet it’s distinct from language reasoning, underexplored, and even not well-defined. With 1 week left to submit, join the 4th Multimodal Algorithmic Reasoning Workshop for the latest insights!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Dominick Reilly

@dominickrei_

7 months ago

⭐️Happy to introduce our #CVPR2025 paper “LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living”! In this paper, we investigate large language-vision models (LLVM) for understanding activities of daily living videos. 📜 Paper and code: adl-x.github.io

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

5 months ago

Salesforce introduces: BLIP3-o: A Family of Fully Open Unified Multimodal Models—Architecture, Training and Dataset "we introduce a novel approach that employs a diffusion transformer to generate semantically rich CLIP image features, in contrast to conventional VAE-based

thumb_up_off_alt278

chat_bubble_outline4

repeat68

shareShare

AK

@_akhaliq

5 months ago

Salesforce just dropped BLIP3-o on Hugging Face A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

thumb_up_off_alt187

chat_bubble_outline2

repeat51

shareShare

Caiming Xiong

@caimingxiong

5 months ago

Introducing 🔥BLIP3-o🔥 -- A Family of Fully Open Unified Multimodal Models for Both Image Understanding and Image Understanding 📊Paper: arxiv.org/pdf/2505.09568 🤗Models and Datasets: huggingface.co/BLIP3o 🧠Code: github.com/JiuhaiChen/BLI… 💻Demo: blip3o.salesforceresearch.ai We

thumb_up_off_alt232

chat_bubble_outline10

repeat51

shareShare

Jiuhai Chen

@jiuhaic

5 months ago

🚀 Introducing BLIP3-o: A Family of Fully Open Unified Multimodal Models arxiv.org/pdf/2505.09568 🔓 Attempting to unlock GPT-4o’s image generation. Open source everything! Including 25 million pre-training data!

thumb_up_off_alt607

chat_bubble_outline10

repeat88

shareShare

Jiuhai Chen

@jiuhaic

5 months ago

Super excited to attend #CVPR2025 in person! Catch our spotlight talk on BLIP3-o at the Computer Vision in the Wild workshop 👉 computer-vision-in-the-wild.github.io/cvpr-2025/ Also check out Florence-VL at poster #372, Sunday 10:30–12:30

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Le Xue

@le_xue01

5 months ago

I'm attending CVPR2025@Nashville this week, we have a few presentations this year, feel free to drop and talk with us🙌 BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset (arxiv.org/abs/2505.09568) @ 6.11, 101B, 2:10pm The 4th Workshop on

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Artemis Panagopoulou

@artemispng

4 months ago

🚨 Are visual programs actually reasoning correctly? Spoiler: 40% of the time, they get the right answer for the wrong reason. Come check out our #CVPR2025 poster (#346) tomorrow — Sunday, June 15th from 10:30am–12:30pm CDT!

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare