Gene Chou (@gene_ch0u) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

I came up with a technique for dynamic token selection in Vision-Language Models. Instead of wasting compute on every part of an image, this method adapts the number of tokens based on the complexity of each region. Here’s an example of how it works: 👇

thumb_up_off_alt376

chat_bubble_outline17

repeat33

shareShare

Bingyi Kang

@bingyikang

6 months ago

Thrilled to introduce Video Depth Anything to support Depth Estimation for super-long videos (over 5 minutes). 👉It enjoys all the benefits of #DepthAnything: high-quality, fast, robust, etc. Proj Page: videodepthanything.github.io

thumb_up_off_alt332

chat_bubble_outline6

repeat57

shareShare

Qianqian Wang

@qianqianwang5

6 months ago

Introducing CUT3R! An online 3D reasoning framework for many 3D tasks directly from just RGB. For static or dynamic scenes. Video or image collections, all in one!

thumb_up_off_alt634

chat_bubble_outline7

repeat113

shareShare

youming.deng

@denghilbert

5 months ago

How can we use wide-FOV cameras for reconstruction? We propose self-calibration Gaussian Splatting that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations to directly reconstruct from a set of wide-angle captures. page: denghilbert.github.io/self-cali/

thumb_up_off_alt185

chat_bubble_outline2

repeat34

shareShare

Yuncong Yang

@yuncongyy

5 months ago

Excited to introduce 3D-Mem! Spatial Intelligence simply isn’t possible without robust 3D Scene Memory. That’s why we developed 3D-Mem, an effective framework for lifelong exploration and reasoning. Thrilled to share that it’s been accepted to #CVPR2025 !

thumb_up_off_alt607

chat_bubble_outline11

repeat84

shareShare

Ning Yu

@realningyu

4 months ago

The first project I led at Netflix Eyeline Studios is headed to #CVPR2025 with 5,5,4 review scores: 🌊Go-with-the-Flow🌊 warps noise for effortless motion control in video diffusion — no pipeline changes, same compute. Direct camera/object motion, transfer movement between

thumb_up_off_alt115

chat_bubble_outline1

repeat21

shareShare

David Fan

@davidjfan

4 months ago

Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

thumb_up_off_alt452

chat_bubble_outline12

repeat93

shareShare

Karan Dalal

@karansdalal

4 months ago

Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by

thumb_up_off_alt5,5K

chat_bubble_outline187

repeat940

shareShare

Xichen Pan

@xichen_pan

3 months ago

We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all. MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN!

thumb_up_off_alt402

chat_bubble_outline9

repeat65

shareShare

Audrey

@oneaudreylim

3 months ago

i built cursor for Blender 3D modeling (native)

thumb_up_off_alt5,5K

chat_bubble_outline232

repeat471

shareShare

Jon Barron

@jon_barron

3 months ago

Here's my 3DV talk, in chapters: 1) Intro / NeRF boilerplate. 2) Recent reconstruction work. 3) Recent generative work. 4) Radiance fields as a field. 5) Why generative video has bitter-lessoned 3D. 6) Why generative video hasn't bitter-lessoned 3D. 5 & 6 are my favorites.

thumb_up_off_alt657

chat_bubble_outline34

repeat86

shareShare

Gordon Wetzstein

@gordonwetzstein

2 months ago

Most video models 🤯forget the past 🐌slow down over time 🔁rely on bidirectional (not causal) attention Our state-space video world models (SSM) 🧠remember across hundreds of frames ⚡️generate at constant speed ⏩is fully causal, enabling real-time rollout 1/3

thumb_up_off_alt183

chat_bubble_outline3

repeat14

shareShare

Xun Huang

@xunhuang1995

2 months ago

Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.

thumb_up_off_alt765

chat_bubble_outline25

repeat120

shareShare

Gene Chou

@gene_ch0u

a month ago

I'll be presenting our work with Kai Zhang at #cvpr2025. We finetune video models to be 3d consistent without any 3d supervision! Feel free to stop by our poster or reach out to chat: Sunday, Jun 15, 4-6pm ExHall D, poster #168 cvpr.thecvf.com/virtual/2025/p…

thumb_up_off_alt67

chat_bubble_outline0

repeat7

shareShare

Sukjun (June) Hwang

@sukjun_hwang

15 days ago

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

thumb_up_off_alt2,2K

chat_bubble_outline58

repeat355

shareShare

Gene Chou

Gate.io

Andi Marafioti

Bingyi Kang

Qianqian Wang

youming.deng

Yuncong Yang

Ning Yu

David Fan

Karan Dalal

Xichen Pan

Audrey

Jon Barron

Gordon Wetzstein

Xun Huang

Gene Chou

Sukjun (June) Hwang