Ang Cao (@angcao3) Twitter Tweets • TwiCopy

Andrew Owens

a year ago

At #CVPR2024: Tactile-augmented Radiance Fields! We probe a scene with a touch sensor and localize each sample within a NeRF. We use diffusion to estimate the tactile signals for the points we didn't touch. x.com/_YimingDou/sta… w/ Yiming Dou, Antonio Loquercio, Fengyu Yang, Yi Liu

thumb_up_off_alt153

chat_bubble_outline4

repeat24

shareShare

Ziyang Chen

@czyangchen

a year ago

These spectrograms look like images, but can also be played as a sound! We call these images that sound. How do we make them? Look and listen below to find out, and to see more examples!

thumb_up_off_alt169

chat_bubble_outline1

repeat41

shareShare

Ayush Shrivastava

@ayshrv

a year ago

We present Global Matching Random Walks, a simple self-supervised approach to the Tracking Any Point (TAP) problem, accepted to #ECCV2024. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks (CRW).

thumb_up_off_alt85

chat_bubble_outline1

repeat23

shareShare

Zichen Wang

@zichen2501

10 months ago

Just released the code at github.com/zichenwang01/r…. See you all in Tokyo!

thumb_up_off_alt152

chat_bubble_outline0

repeat16

shareShare

Linyi Jin

@jin_linyi

9 months ago

Introducing 👀Stereo4D👀 A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories. We used Stereo4D to make a dataset of over 100k real-world 4D scenes.

thumb_up_off_alt524

chat_bubble_outline13

repeat102

shareShare

AK

@_akhaliq

8 months ago

UnCommon Objects in 3D

thumb_up_off_alt58

chat_bubble_outline3

repeat9

shareShare

David Novotny

@davnov134

8 months ago

We are releasing uCO3D! Built to supercharge 3D GenAI and digital-twin models, this evolution of CO3D features more and higher-quality object videos from 1k categories, 3D Gaussian Splats, and streamlined OSS tools. 💻Data&code: github.com/facebookresear… 📄Paper:

thumb_up_off_alt136

chat_bubble_outline1

repeat24

shareShare

Ang Cao

@angcao3

6 months ago

Check out Fast3r, an amazing work from Jianing “Jed” Yang @ CVPR, which supports 1000+ images and orders of magnitude faster than Dust3r!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Jianyuan Wang

@jianyuan_wang

6 months ago

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense

thumb_up_off_alt3,3K

chat_bubble_outline21

repeat195

shareShare

Ang Cao

@angcao3

6 months ago

This is so amazing

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Chris Rockwell

@_crockwell

5 months ago

Ever wish YouTube had 3D labels? 🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with camera pose! Applications include camera-controlled video generation🤩and learned dynamic pose estimation😯 Download: huggingface.co/datasets/nvidi…

thumb_up_off_alt177

chat_bubble_outline2

repeat39

shareShare

Ang Cao

@angcao3

2 months ago

Can we train a 3D-language multimodality Transformer using 2D VLMs and rendering loss? Sasha (Alexander) Sax will present our new #icml25 paper on Wednesday 2pm at Hall B2-B3 W200. Please come and check! Project Page: liftgs.github.io

thumb_up_off_alt133

chat_bubble_outline0

repeat21

shareShare

Tiange Luo

@tiangeluo

2 months ago

Introducing Visual Test-time Scaling for GUI Agent Grounding (ICCV'25, completed prior to the release of OpenAI-O3) When "thinking with images", the key chanlleging is designing the action in pixels space. We can zoom into regions of varying sizes and shapes, apply image

thumb_up_off_alt54

chat_bubble_outline2

repeat10

shareShare