
Le Xue
@le_xue01
Senior Applied Scientist @ Salesforce AI Research Lab
Large Multimodal Model, Multimodal 3D Vision
ID: 1639381943883292672
24-03-2023 21:41:39
73 Tweet
191 Followers
145 Following





We're so excited to see VentureBeat and @MarkTechPost cover ProVision! We're tackling the visual instruction data challenge with scene graphs + human-written programs, already seeing 3-8% improvements across benchmarks. The real win? A more #OpenSourced, reproducible approach


Honored to see VentureBeat highlight Salesforce AI Research's work on ProVision. Open sourcing this framework helps democratize #MultimodalAI development. Proud of the team's innovative approach using scene graphs to generate high-quality visual instruction data at scale. 📷 #AIResearch

🚨🎥🚨🎥🚨 xGen-MM-Vid (BLIP-3-Video) is now available on Hugging Face! Our compact VLM achieves SOTA performance with just 32 tokens for video understanding. Features explicit temporal encoder + BLIP-3 architecture. Try it out! 🤗32 Token Model: bit.ly/3PBNBBz 🤗128


Thrilled to share our compact but powerful video-language models on Hugging Face ! arxiv: arxiv.org/abs/2410.16267 Great work from the video team at Salesforce AI Research 🚀🚀









