Shengqu Cai (@prime_cai) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Dreaming Tulpa 🥓👑

@dreamingtulpa

4 months ago

Goodbye LoRA (Part 17) 👋 Diffusion Self-Distillation can generate high-quality images of specific subjects in new settings by preserving identity. Also supports relighting 👌

thumb_up_off_alt887

chat_bubble_outline29

repeat116

shareShare

「ICLR 2025」We introduce a new method for noise warping that reduces the time cost of existing SOTA by 42x and memory cost by 7x, without compromising quality. Our method can “paste” noise textures onto screen-space or 3D objects to enhance temporal or multi-view consistency.

thumb_up_off_alt149

chat_bubble_outline4

repeat26

shareShare

Ceyuan Yang

@ceyuany

4 months ago

We propose Long Context Tuning (LCT) for scene-level video generation to bridge the gap between current single-shot generation and real-world narrative video productions. Homepage: guoyww.github.io/projects/long-… Report: arxiv.org/abs/2503.10589

thumb_up_off_alt103

chat_bubble_outline4

repeat23

shareShare

Yuwei Guo

@guoywguo

4 months ago

Towards scene-level video generation! See our latest work: Long Context Tuning for Video Generation Homepage: guoyww.github.io/projects/long-… Report: arxiv.org/pdf/2503.10589

thumb_up_off_alt70

chat_bubble_outline5

repeat13

shareShare

AK

@_akhaliq

4 months ago

Long Context Tuning for Video Generation

thumb_up_off_alt155

chat_bubble_outline7

repeat32

shareShare

Heather Cooper

@hbcoop_

4 months ago

Introducing Diffusion Self-Distillation (DSD): A new method from Shengqu Cai and Stanford University researchers that fine-tunes text-to-image models to enable "identity-preserving generation" for characters, objects and scenes with a single input image. It works for any style,

thumb_up_off_alt61

chat_bubble_outline4

repeat15

shareShare

Shengqu Cai

@prime_cai

4 months ago

Thanks for sharing! Heather Cooper

Thanks for sharing! <a href="/HBCoop_/">Heather Cooper</a>

thumb_up_off_alt10

chat_bubble_outline1

repeat1

shareShare

Ceyuan Yang

@ceyuany

4 months ago

Check out our latest work CameraCtrl II. By carefully collecting and processing data and introducing as little inductive bias as we can, users are allowed to explore the generated world with appealing dynamics and consistency. Together with extension and distillation, CameraCtrl

thumb_up_off_alt120

chat_bubble_outline4

repeat25

shareShare

Ceyuan Yang

@ceyuany

4 months ago

To further understand the consistency of generated ones, we visualize the 3D reconstruction results here.

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Yang Zheng

@yang_zheng18

4 months ago

Can we reconstruct relightable human hair appearance from real-world visual observations? We introduce GroomLight, a hybrid inverse rendering method for relightable human hair appearance modeling. syntec-research.github.io/GroomLight/

thumb_up_off_alt144

chat_bubble_outline3

repeat37

shareShare

Ian Huang

@ianhuang3d

4 months ago

🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇

thumb_up_off_alt384

chat_bubble_outline23

repeat105

shareShare

Qingqing Zhao

@qingqing_zhao_

4 months ago

Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖 By leveraging next-frame prediction as visual chain-of-thought reasoning, CoT-VLA uses future prediction to guide action generation and unlock large-scale video data for training. #CVPR2025

thumb_up_off_alt291

chat_bubble_outline5

repeat55

shareShare

Hansheng Chen

@hanshengch

3 months ago

Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) github.com/lakonik/gmflow GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.

thumb_up_off_alt122

chat_bubble_outline1

repeat31

shareShare

jianhao

@jianhao75895505

3 months ago

🥳Excited to share our latest work, WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments, accepted to #CVPR2025 🌐 We present a robust monocular RGB SLAM system that uses uncertainty-aware tracking and mapping to handle dynamic scenes.

thumb_up_off_alt195

chat_bubble_outline3

repeat46

shareShare

Ceyuan Yang

@ceyuany

3 months ago

Glad to share Seaweed-7B, a cost-effective foundation model for video generation. Our tech report highlights the key designs that significantly improve compute efficiency and performance given limited resources, achieving comparable quality against other industry-level models. To

thumb_up_off_alt520

chat_bubble_outline34

repeat100

shareShare

Liyuan Zhu

@liyuan_zz

2 months ago

🔔 [SIGGRAPH ’25] Want to redesign your apartment and control the style of every piece of furniture? (virtual try-on for 3D scenes). 🎨 Introducing ReStyle3D, a method that transforms your apartment into the design styles as you want! #stylization #interiordesign

thumb_up_off_alt152

chat_bubble_outline1

repeat28

shareShare

Shengqu Cai

@prime_cai

2 months ago

Our new ACM SIGGRAPH paper led by the amazing Liyuan Zhu! We take the "virtual try-on" task, but for 3D and interior designs!

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

AK

@_akhaliq

a month ago

ByteDance presents APT2 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

thumb_up_off_alt511

chat_bubble_outline13

repeat89

shareShare

AK

@_akhaliq

a month ago

Seedance 1.0 Exploring the Boundaries of Video Generation Models

thumb_up_off_alt467

chat_bubble_outline9

repeat78

shareShare

Yunzhi Zhang

@zhang_yunzhi

a month ago

(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.

thumb_up_off_alt296

chat_bubble_outline4

repeat61

shareShare

Shengqu Cai

Gate.io

Dreaming Tulpa 🥓👑

Yitong Deng

Ceyuan Yang

Yuwei Guo

AK

Heather Cooper

Shengqu Cai

Ceyuan Yang

Ceyuan Yang

Yang Zheng

Ian Huang

Qingqing Zhao

Hansheng Chen

jianhao

Ceyuan Yang

Liyuan Zhu

Shengqu Cai

AK

AK

Yunzhi Zhang