Wenbo Hu@ICLR🇸🇬 (@gordonhu608) 's Twitter Profile
Wenbo Hu@ICLR🇸🇬

@gordonhu608

CS PhD Student @UCLA | Vision-Language & Embodied AI | B.S @UCSanDiego

ID: 1648471776681394176

linkhttp://gordonhu608.github.io calendar_today18-04-2023 23:41:27

250 Tweet

302 Followers

440 Following

Yining Hong (@yining_hong) 's Twitter Profile Photo

🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain! 🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.

🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain!
🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.
Manling Li (@manlingli_) 's Twitter Profile Photo

[NeurIPS D&B Oral] Embodied Agent Interface: Benchmarking LLMs for Embodied Agents A single line of code to evaluate your model! 🌟Standardize Goal Specifications: LTL 🌟Standardize Modules and Interfaces: 4 modules, 438 tasks, 1475 goals 🌟Standardize Fine-grained Metrics: 18

Xueqing Wu (@xueqing_w) 's Twitter Profile Photo

Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement. 🌐Project: visco-benchmark.github.io 📄Paper: arxiv.org/abs/2412.02172

Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement.

🌐Project: visco-benchmark.github.io
📄Paper: arxiv.org/abs/2412.02172
Xiaolong Wang (@xiaolonw) 's Twitter Profile Photo

Collaborating between #NVIDIA and #UCSD, we build NaVILA, the foundational navigation VLA for humanoids and quadrupeds. This is enabled by a 2-level framework, a direction I am pushing a lot these days: 1⃣ A VLA that outputs mid-level actions, like "turn left 15 degrees". 2⃣ A

Yu Yang (@yuyang_i) 's Twitter Profile Photo

1/ I'll be at #NeurIPS2024 presenting our work SmallToLarge (S2L): Data-efficient Fine-tuning of LLMs! 🚀 What’s S2L? It’s a scalable data selection method that trains a small proxy model to guide fine-tuning for larger models, reducing costs while preserving performance. 👇

1/ I'll be at #NeurIPS2024 presenting our work SmallToLarge (S2L): Data-efficient Fine-tuning of LLMs! 🚀

What’s S2L? It’s a scalable data selection method that trains a small proxy model to guide fine-tuning for larger models, reducing costs while preserving performance. 👇
Wenbo Hu@ICLR🇸🇬 (@gordonhu608) 's Twitter Profile Photo

I'll be at #NeurIPS Vancouver between 12/9 and 12/13. Presenting this work on Thursday 4:30pm - 7:30pm at East Exhibit Hall A-C #3509. Welcome old and new friends to chat on multimodal AI research and more! My DM is open :)

Wenbo Hu@ICLR🇸🇬 (@gordonhu608) 's Twitter Profile Photo

Had an incredible experience at #NeurIPS2024 ! It was fantastic to connect with so many people interested in our work and to gain valuable insights and inspiration for the future of multimodal research. I’m deeply grateful for the opportunity to present our work with my amazing

Had an incredible experience at #NeurIPS2024 ! It was fantastic to connect with so many people interested in our work and to gain valuable insights and inspiration for the future of multimodal research.

I’m deeply grateful for the opportunity to present our work with my amazing
Wenbo Hu@ICLR🇸🇬 (@gordonhu608) 's Twitter Profile Photo

Excited to share MRAG-Bench is accepted at #ICLR2025 🇸🇬. The image corpus is a rich source of information, and extracting knowledge from it can often be more advantageous than from a text corpus. We study how MLLMs can utilize vision-centric multimodal knowledge. More in our

Yihe Deng (@yihe__deng) 's Twitter Profile Photo

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. 

By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%
uclanlp (@uclanlp) 's Twitter Profile Photo

📣 For this week’s NLP Seminar, we are thrilled to host Zhe Gan Zhe Gan to give a talk titled “How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents”! 🗓️ 4/11 Fri 2pm PT Registration: forms.gle/TNXfBZJiMJjL18…

📣 For this week’s NLP Seminar, we are thrilled to host Zhe Gan <a href="/zhegan4/">Zhe Gan</a> to give a talk titled
“How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents”!

🗓️ 4/11 Fri 2pm PT
Registration: forms.gle/TNXfBZJiMJjL18…
Wenbo Hu@ICLR🇸🇬 (@gordonhu608) 's Twitter Profile Photo

Excited to be at #ICLR2025 🇸🇬 between 4/24 and 4/28 and sharing this work on Multimodal RAG. Presenting this work on 4/26 Saturday 3pm - 5:30pm at Hall 3 + Hall 2B #108. I'm also happy to chat about multimodal models, 3D vision-language, and embodied AI in general with old

Zongyu Lin (@zy27962986) 's Twitter Profile Photo

Introducing 😶‍🌫️DreamGen, the pioneering approach to neural trajectories + robotics at NVIDIA GEAR lab. We’re among the first to show how large-scale synthetic data can significantly improve a robot’s ability to generalize to new actions and environments. If you’re interested,

Wenbo Hu@ICLR🇸🇬 (@gordonhu608) 's Twitter Profile Photo

This work will give an Oral Presentation #CVPR2025 at the Foundation Models Meet Embodied Agents workshop (Wed 10am, 6/11). Please join to hear Yining Hong presents our work.