Jay Karhade (@jaykarhade) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

All researchers should fight against this. Every week I try to persuade my students that top papers often have few quantitative results. With work that's new, important, and clearly qualitatively different (zero to one!), you don't need quantitative results. Demos not tables!

thumb_up_off_alt616

chat_bubble_outline11

repeat95

shareShare

Kyle Sargent

@kylesargentai

5 months ago

Modern generative models of images and videos rely on tokenizers. Can we build a state-of-the-art discrete image tokenizer with a diffusion autoencoder? Yes! I’m excited to share FlowMo, with Kyle Hsu, Justin Johnson, Fei-Fei Li, Jiajun Wu. A thread 🧵:

thumb_up_off_alt598

chat_bubble_outline12

repeat176

shareShare

Tarasha Khurana

@tarashakhurana

4 months ago

Reminder that the deadline to submit your cool work to the Workshop on 4D vision @ CVPR 2025 is in about a week!!!

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Hong-Xing "Koven" Yu

@koven_yu

4 months ago

🔥Spatial intelligence requires world generation, and now we have the first comprehensive evaluation benchmark📏 for it! Introducing WorldScore: Unifying evaluation for 3D, 4D, and video models on world generation! 🧵1/7 Web: haoyi-duan.github.io/WorldScore/ arxiv: arxiv.org/abs/2504.00983

thumb_up_off_alt244

chat_bubble_outline6

repeat116

shareShare

Khiem Vuong

@kvuongdev

4 months ago

[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.

thumb_up_off_alt409

chat_bubble_outline7

repeat73

shareShare

Zhiqiu Lin

@zhiqiulin

3 months ago

Fresh GPT‑o3 results on our vision‑centric #NaturalBench (NeurIPS’24) benchmark! 🎯 Its new visual chain‑of‑thought—by “zooming in” on details—cracks questions that still stump GPT‑4o. Yet vision reasoning isn’t solved: o3 can still hallucinate even after a full minute of

thumb_up_off_alt111

chat_bubble_outline3

repeat22

shareShare

Ishan Khatri ✈️ ICLR'25

@i_ikhatri

3 months ago

Just over a month left to submit to this year's Argoverse 2 challenges! Returning from previous years, are our motion forecasting and lidar scene flow challenges. And NEW for this year with a $10k prize pool is our Scenario Mining challenge! 🧵👇

thumb_up_off_alt13

chat_bubble_outline1

repeat9

shareShare

Chris Rockwell

@_crockwell

3 months ago

Ever wish YouTube had 3D labels? 🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with camera pose! Applications include camera-controlled video generation🤩and learned dynamic pose estimation😯 Download: huggingface.co/datasets/nvidi…

thumb_up_off_alt177

chat_bubble_outline2

repeat39

shareShare

Jay Karhade

@jaykarhade

3 months ago

Super cool project to have been involved in! Camera motion understanding is far from solved — even top SLAM/SfM and VLM models struggle in the wild. CameraBench pushes the frontier with high-quality annotations and cinematographer-designed taxonomy. VLMs 🤝 SFM next ?😉

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

Chuang Gan

@gan_chuang

3 months ago

What a fun collaboration with Zhiqiu on this summer internship project! Understanding camera motion in videos is extremely challenging, and this CameraBench will be critically important for both video captioning and video generation!

thumb_up_off_alt33

chat_bubble_outline0

repeat4

shareShare

AK

@_akhaliq

3 months ago

Towards Understanding Camera Motions in Any Video is out on Hugging Face

thumb_up_off_alt84

chat_bubble_outline1

repeat17

shareShare

Hanwen Jiang

@hanwenjiang1

3 months ago

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)

thumb_up_off_alt391

chat_bubble_outline5

repeat69

shareShare

Justin Johnson

@jcjohnss

3 months ago

Compute increases in the last ~decade are insane. The B200 is 1000x faster than the K40 that was state-of-the-art when I started my PhD. We used to train on 1 GPU; now 10K+ is common. Combining these gives a speedup of 10 million since 2013. This explosion led to modern AI.

thumb_up_off_alt285

chat_bubble_outline10

repeat39

shareShare

Akash Sharma

@akashshrm02

3 months ago

Last week I passed my thesis proposal, and I'm now officially a Ph.D. candidate! My proposed thesis "Self supervised perception for tactile dexterity" will explore ways to improve dexterous manipulation using tactile reps. Thanks to my committee and everyone that supported me!

thumb_up_off_alt24

chat_bubble_outline3

repeat1

shareShare

Akash Sharma

@akashshrm02

2 months ago

Robots need touch for human-like hands to reach the goal of general manipulation. However, approaches today don’t use tactile sensing or use specific architectures per tactile task. Can 1 model improve many tactile tasks? 🌟Introducing Sparsh-skin: tinyurl.com/y935wz5c 1/6

thumb_up_off_alt224

chat_bubble_outline3

repeat47

shareShare

Mihir Prabhudesai

@mihirp98

2 months ago

Excited to share our work: Maximizing Confidence Alone Improves Reasoning Humans rely on confidence to learn when answer keys aren’t available (e.g taking an exam). Surprisingly, LLMs can also learn w/o ground-truth answers, simply by reinforcing high-confidence answers via RL!

thumb_up_off_alt276

chat_bubble_outline14

repeat34

shareShare

Fei-Fei Li

@drfeifei

2 months ago

Check out this shiny new, fast and dynamic web renderer for 3D Gaussian Splats! The things one could do are just mind boggling! So proud of the World Labs team that made this happen, and we are making this open source for everyone!

thumb_up_off_alt305

chat_bubble_outline11

repeat40

shareShare

Jay Karhade

@jaykarhade

2 months ago

UFM is a step forward towards solving the top 3 problems of computer vision: Correspondence, Correspondence and Correspondence 🙃 Exciting colab which was led by Yuchen Zhang! 1 year in the making, and lots of engineering and insights uncovered!

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Zhenjun Zhao

@zhenjun_zhao

2 months ago

UFM: A Simple Path towards Unified Dense Correspondence with Flow Yuchen Zhang, Nikhil Keetha, Chenwei Lyu, Bhuvan Jhamb, Yutian Chen, Yuheng Qiu, Jay Karhade, Shreyas Jha, Yaoyu HU, Deva Ramanan, Sebastian Scherer, Wenshan Wang tl;dr: transformer-based architecture using covisibility

UFM: A Simple Path towards Unified Dense Correspondence with Flow

<a href="/YuchenZhan54250/">Yuchen Zhang</a>, <a href="/Nik__V__/">Nikhil Keetha</a>, Chenwei Lyu, <a href="/robo2902/">Bhuvan Jhamb</a>, Yutian Chen, <a href="/QiuYuhengQiu/">Yuheng Qiu</a>, <a href="/JayKarhade/">Jay Karhade</a>, Shreyas Jha, <a href="/YaoyuHU/">Yaoyu HU</a>, <a href="/RamananDeva/">Deva Ramanan</a>, <a href="/smash0190/">Sebastian Scherer</a>, Wenshan Wang

tl;dr: transformer-based architecture using covisibility

thumb_up_off_alt36

chat_bubble_outline1

repeat7

shareShare

Haoyu Xiong

@haoyu_xiong_

2 months ago

Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust

thumb_up_off_alt304

chat_bubble_outline11

repeat74

shareShare

Jay Karhade

Gate.io

Andrew Davison

Kyle Sargent

Tarasha Khurana

Hong-Xing "Koven" Yu

Khiem Vuong

Zhiqiu Lin

Ishan Khatri ✈️ ICLR'25

Chris Rockwell

Jay Karhade

Chuang Gan

AK

Hanwen Jiang

Justin Johnson

Akash Sharma

Akash Sharma

Mihir Prabhudesai

Fei-Fei Li

Jay Karhade

Zhenjun Zhao

Haoyu Xiong