Wenbo Hu@ICLR🇸🇬 (@gordonhu608) Twitter Tweets • TwiCopy

Yining Hong

10 months ago

🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain! 🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.

thumb_up_off_alt165

chat_bubble_outline2

repeat37

shareShare

Manling Li

@manlingli_

10 months ago

[NeurIPS D&B Oral] Embodied Agent Interface: Benchmarking LLMs for Embodied Agents A single line of code to evaluate your model! 🌟Standardize Goal Specifications: LTL 🌟Standardize Modules and Interfaces: 4 modules, 438 tasks, 1475 goals 🌟Standardize Fine-grained Metrics: 18

thumb_up_off_alt282

chat_bubble_outline5

repeat69

shareShare

Xueqing Wu

@xueqing_w

9 months ago

Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement. 🌐Project: visco-benchmark.github.io 📄Paper: arxiv.org/abs/2412.02172

thumb_up_off_alt133

chat_bubble_outline3

repeat36

shareShare

Cheng-Fu Joey Yang

@cfyang58

9 months ago

📣 New Paper: Verbalized Representation Learning (VRL) VRL bridges prompt engineering and representation learning to enable automatic interpretable feature extraction — all without gradient descent! 🔥 +29% over SOTA 📊 95% less data arxiv.org/abs/2411.18651 uclanlp (1/n)

thumb_up_off_alt27

chat_bubble_outline2

repeat11

shareShare

Xiaolong Wang

@xiaolonw

9 months ago

Collaborating between #NVIDIA and #UCSD, we build NaVILA, the foundational navigation VLA for humanoids and quadrupeds. This is enabled by a 2-level framework, a direction I am pushing a lot these days: 1⃣ A VLA that outputs mid-level actions, like "turn left 15 degrees". 2⃣ A

thumb_up_off_alt122

chat_bubble_outline2

repeat21

shareShare

Yu Yang

@yuyang_i

9 months ago

1/ I'll be at #NeurIPS2024 presenting our work SmallToLarge (S2L): Data-efficient Fine-tuning of LLMs! 🚀 What’s S2L? It’s a scalable data selection method that trains a small proxy model to guide fine-tuning for larger models, reducing costs while preserving performance. 👇

thumb_up_off_alt137

chat_bubble_outline3

repeat20

shareShare

Wenbo Hu@ICLR🇸🇬

@gordonhu608

9 months ago

I'll be at #NeurIPS Vancouver between 12/9 and 12/13. Presenting this work on Thursday 4:30pm - 7:30pm at East Exhibit Hall A-C #3509. Welcome old and new friends to chat on multimodal AI research and more! My DM is open :)

thumb_up_off_alt16

chat_bubble_outline0

repeat2

shareShare

Wenbo Hu@ICLR🇸🇬

@gordonhu608

9 months ago

Had an incredible experience at #NeurIPS2024 ! It was fantastic to connect with so many people interested in our work and to gain valuable insights and inspiration for the future of multimodal research. I’m deeply grateful for the opportunity to present our work with my amazing

thumb_up_off_alt62

chat_bubble_outline0

repeat5

shareShare

Wenbo Hu@ICLR🇸🇬

@gordonhu608

8 months ago

Excited to share MRAG-Bench is accepted at #ICLR2025 🇸🇬. The image corpus is a rich source of information, and extracting knowledge from it can often be more advantageous than from a text corpus. We study how MLLMs can utilize vision-centric multimodal knowledge. More in our

thumb_up_off_alt33

chat_bubble_outline0

repeat3

shareShare

Yihe Deng

@yihe__deng

6 months ago

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%

thumb_up_off_alt170

chat_bubble_outline3

repeat37

shareShare

uclanlp

@uclanlp

5 months ago

📣 For this week’s NLP Seminar, we are thrilled to host Zhe Gan Zhe Gan to give a talk titled “How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents”! 🗓️ 4/11 Fri 2pm PT Registration: forms.gle/TNXfBZJiMJjL18…

📣 For this week’s NLP Seminar, we are thrilled to host Zhe Gan <a href="/zhegan4/">Zhe Gan</a> to give a talk titled
“How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents”!

🗓️ 4/11 Fri 2pm PT
Registration: forms.gle/TNXfBZJiMJjL18…

thumb_up_off_alt20

chat_bubble_outline0

repeat9

shareShare

Wenbo Hu@ICLR🇸🇬

@gordonhu608

5 months ago

Excited to be at #ICLR2025 🇸🇬 between 4/24 and 4/28 and sharing this work on Multimodal RAG. Presenting this work on 4/26 Saturday 3pm - 5:30pm at Hall 3 + Hall 2B #108. I'm also happy to chat about multimodal models, 3D vision-language, and embodied AI in general with old

thumb_up_off_alt24

chat_bubble_outline1

repeat6

shareShare

Zongyu Lin

@zy27962986

4 months ago

Introducing 😶‍🌫️DreamGen, the pioneering approach to neural trajectories + robotics at NVIDIA GEAR lab. We’re among the first to show how large-scale synthetic data can significantly improve a robot’s ability to generalize to new actions and environments. If you’re interested,

thumb_up_off_alt19

chat_bubble_outline2

repeat8

shareShare

Wenbo Hu@ICLR🇸🇬

@gordonhu608

3 months ago

This work will give an Oral Presentation #CVPR2025 at the Foundation Models Meet Embodied Agents workshop (Wed 10am, 6/11). Please join to hear Yining Hong presents our work.

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare