Le Xue (@le_xue01) 's Twitter Profile
Le Xue

@le_xue01

Senior Applied Scientist @ Salesforce AI Research Lab
Large Multimodal Model, Multimodal 3D Vision

ID: 1639381943883292672

calendar_today24-03-2023 21:41:39

73 Tweet

191 Followers

145 Following

Le Xue (@le_xue01) 's Twitter Profile Photo

🍃MINT-1T (arxiv.org/abs/2406.11271), the largest open-source interleaved multimodal dataset, will be presented in NeurIPS today 11 a.m. — 2 p.m. PST at East Exhibit Hall A-C #3604. Drop by and discuss🙌 Affiliations: University of Washington, Salesforce AI Research

🍃MINT-1T (arxiv.org/abs/2406.11271), the largest open-source interleaved multimodal dataset, will be presented in NeurIPS today 11 a.m. — 2 p.m. PST at East Exhibit Hall A-C #3604. Drop by and discuss🙌

Affiliations: University of Washington, Salesforce AI Research
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🔬🔬🔬Introducing ProVision: A new system for transforming images into verified instruction data for multimodal language models (MLMs) at massive scale! Scene graphs + programmatic synthesis generate 10M+ diverse, automated Q&A pairs. Fully verifiable. Training MLMs? Dive in:

🔬🔬🔬Introducing ProVision: A new system for transforming images into verified instruction data for multimodal language models (MLMs) at massive scale! 
Scene graphs + programmatic synthesis generate 10M+ diverse, automated Q&A pairs. Fully verifiable.

Training MLMs? Dive in:
VentureBeat (@venturebeat) 's Twitter Profile Photo

Breaking the data bottleneck: Salesforce's ProVision speeds multimodal AI training with image scene graphs venturebeat.com/data-infrastru…

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

We're so excited to see VentureBeat and @MarkTechPost cover ProVision! We're tackling the visual instruction data challenge with scene graphs + human-written programs, already seeing 3-8% improvements across benchmarks. The real win? A more #OpenSourced, reproducible approach

Juan Carlos Niebles (@jcniebles) 's Twitter Profile Photo

More on ProVision: Our instruction data generation system for multimodal language models. 📄arxiv: arxiv.org/abs/2412.07012 💻 GitHub: github.com/JieyuZ2/ProVis… See thread for more discussion ⬇️🧵

Silvio Savarese (@silviocinguetta) 's Twitter Profile Photo

Honored to see VentureBeat highlight Salesforce AI Research's work on ProVision. Open sourcing this framework helps democratize #MultimodalAI development. Proud of the team's innovative approach using scene graphs to generate high-quality visual instruction data at scale. 📷 #AIResearch

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🚨🎥🚨🎥🚨 xGen-MM-Vid (BLIP-3-Video) is now available on Hugging Face! Our compact VLM achieves SOTA performance with just 32 tokens for video understanding. Features explicit temporal encoder + BLIP-3 architecture. Try it out! 🤗32 Token Model: bit.ly/3PBNBBz 🤗128

🚨🎥🚨🎥🚨 xGen-MM-Vid (BLIP-3-Video) is now available on <a href="/huggingface/">Hugging Face</a>!

Our compact VLM achieves SOTA performance with just 32 tokens for video understanding. Features explicit temporal encoder + BLIP-3 architecture. Try it out!

🤗32 Token Model: bit.ly/3PBNBBz
🤗128
Honglu Zhou (@zhou_honglu) 's Twitter Profile Photo

Can AI models reason now?? Let's find out!!! Only 2 days left to submit your work to the 4th **Multimodal Algorithmic Reasoning Workshop** at #CVPR 2025. We welcome both published & unpublished papers. Submit now: marworkshop.github.io/cvpr25/

Can AI models reason now?? Let's find out!!! Only 2 days left to submit your work to the 4th **Multimodal Algorithmic Reasoning Workshop** at #CVPR 2025. We welcome both published &amp; unpublished papers. Submit now: marworkshop.github.io/cvpr25/
Le Xue (@le_xue01) 's Twitter Profile Photo

Multimodal reasoning is a promising research frontier, especially after DeepSeek R1. Yet it’s distinct from language reasoning, underexplored, and even not well-defined. With 1 week left to submit, join the 4th Multimodal Algorithmic Reasoning Workshop for the latest insights!

Dominick Reilly (@dominickrei_) 's Twitter Profile Photo

⭐️Happy to introduce our #CVPR2025 paper “LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living”! In this paper, we investigate large language-vision models (LLVM) for understanding activities of daily living videos. 📜 Paper and code: adl-x.github.io

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Salesforce introduces: BLIP3-o: A Family of Fully Open Unified Multimodal Models—Architecture, Training and Dataset "we introduce a novel approach that employs a diffusion transformer to generate semantically rich CLIP image features, in contrast to conventional VAE-based

Salesforce introduces:

BLIP3-o: A Family of Fully Open Unified Multimodal
Models—Architecture, Training and Dataset

"we introduce a novel approach that employs a diffusion transformer to generate semantically rich CLIP image features, in contrast to conventional VAE-based
AK (@_akhaliq) 's Twitter Profile Photo

Salesforce just dropped BLIP3-o on Hugging Face A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Salesforce just dropped BLIP3-o on Hugging Face

A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Caiming Xiong (@caimingxiong) 's Twitter Profile Photo

Introducing 🔥BLIP3-o🔥 -- A Family of Fully Open Unified Multimodal Models for Both Image Understanding and Image Understanding 📊Paper: arxiv.org/pdf/2505.09568 🤗Models and Datasets: huggingface.co/BLIP3o 🧠Code: github.com/JiuhaiChen/BLI… 💻Demo: blip3o.salesforceresearch.ai We

Introducing 🔥BLIP3-o🔥 -- A Family of Fully Open Unified Multimodal Models for Both Image Understanding and Image Understanding

📊Paper: arxiv.org/pdf/2505.09568
🤗Models and Datasets: huggingface.co/BLIP3o
🧠Code: github.com/JiuhaiChen/BLI…
💻Demo: blip3o.salesforceresearch.ai

We
Jiuhai Chen (@jiuhaic) 's Twitter Profile Photo

🚀 Introducing BLIP3-o: A Family of Fully Open Unified Multimodal Models arxiv.org/pdf/2505.09568 🔓 Attempting to unlock GPT-4o’s image generation. Open source everything! Including 25 million pre-training data!

🚀 Introducing BLIP3-o: A Family of Fully Open Unified Multimodal Models arxiv.org/pdf/2505.09568
🔓 Attempting to unlock GPT-4o’s image generation.
Open source everything! 
Including 25 million pre-training data!
Jiuhai Chen (@jiuhaic) 's Twitter Profile Photo

Super excited to attend #CVPR2025 in person! Catch our spotlight talk on BLIP3-o at the Computer Vision in the Wild workshop 👉 computer-vision-in-the-wild.github.io/cvpr-2025/ Also check out Florence-VL at poster #372, Sunday 10:30–12:30

Le Xue (@le_xue01) 's Twitter Profile Photo

I'm attending CVPR2025@Nashville this week, we have a few presentations this year, feel free to drop and talk with us🙌 BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset (arxiv.org/abs/2505.09568) @ 6.11, 101B, 2:10pm The 4th Workshop on

Artemis Panagopoulou (@artemispng) 's Twitter Profile Photo

🚨 Are visual programs actually reasoning correctly? Spoiler: 40% of the time, they get the right answer for the wrong reason. Come check out our #CVPR2025 poster (#346) tomorrow — Sunday, June 15th from 10:30am–12:30pm CDT!

🚨 Are visual programs actually reasoning correctly? Spoiler: 40% of the time, they get the right answer for the wrong reason.

Come check out our #CVPR2025 poster (#346) tomorrow — Sunday, June 15th from 10:30am–12:30pm CDT!