Shengqu Cai (@prime_cai) 's Twitter Profile
Shengqu Cai

@prime_cai

CS PhD student @Stanford

ID: 842765835102044160

linkhttps://primecai.github.io/ calendar_today17-03-2017 15:53:27

160 Tweet

861 Followers

395 Following

Dreaming Tulpa 🥓👑 (@dreamingtulpa) 's Twitter Profile Photo

Goodbye LoRA (Part 17) 👋 Diffusion Self-Distillation can generate high-quality images of specific subjects in new settings by preserving identity. Also supports relighting 👌

Goodbye LoRA (Part 17) 👋

Diffusion Self-Distillation can generate high-quality images of specific subjects in new settings by preserving identity. Also supports relighting 👌
Yitong Deng (@yitong_deng) 's Twitter Profile Photo

「ICLR 2025」We introduce a new method for noise warping that reduces the time cost of existing SOTA by 42x and memory cost by 7x, without compromising quality. Our method can “paste” noise textures onto screen-space or 3D objects to enhance temporal or multi-view consistency.

Ceyuan Yang (@ceyuany) 's Twitter Profile Photo

We propose Long Context Tuning (LCT) for scene-level video generation to bridge the gap between current single-shot generation and real-world narrative video productions. Homepage: guoyww.github.io/projects/long-… Report: arxiv.org/abs/2503.10589

Yuwei Guo (@guoywguo) 's Twitter Profile Photo

Towards scene-level video generation! See our latest work: Long Context Tuning for Video Generation Homepage: guoyww.github.io/projects/long-… Report: arxiv.org/pdf/2503.10589

Heather Cooper (@hbcoop_) 's Twitter Profile Photo

Introducing Diffusion Self-Distillation (DSD): A new method from Shengqu Cai and Stanford University researchers that fine-tunes text-to-image models to enable "identity-preserving generation" for characters, objects and scenes with a single input image. It works for any style,

Ceyuan Yang (@ceyuany) 's Twitter Profile Photo

Check out our latest work CameraCtrl II. By carefully collecting and processing data and introducing as little inductive bias as we can, users are allowed to explore the generated world with appealing dynamics and consistency. Together with extension and distillation, CameraCtrl

Yang Zheng (@yang_zheng18) 's Twitter Profile Photo

Can we reconstruct relightable human hair appearance from real-world visual observations? We introduce GroomLight, a hybrid inverse rendering method for relightable human hair appearance modeling. syntec-research.github.io/GroomLight/

Ian Huang (@ianhuang3d) 's Twitter Profile Photo

🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇

Qingqing Zhao (@qingqing_zhao_) 's Twitter Profile Photo

Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖 By leveraging next-frame prediction as visual chain-of-thought reasoning, CoT-VLA uses future prediction to guide action generation and unlock large-scale video data for training. #CVPR2025

Hansheng Chen (@hanshengch) 's Twitter Profile Photo

Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) github.com/lakonik/gmflow GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.

Excited to share our work: 
Gaussian Mixture Flow Matching Models (GMFlow)
github.com/lakonik/gmflow
GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.
jianhao (@jianhao75895505) 's Twitter Profile Photo

🥳Excited to share our latest work, WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments, accepted to #CVPR2025 🌐 We present a robust monocular RGB SLAM system that uses uncertainty-aware tracking and mapping to handle dynamic scenes.

Ceyuan Yang (@ceyuany) 's Twitter Profile Photo

Glad to share Seaweed-7B, a cost-effective foundation model for video generation. Our tech report highlights the key designs that significantly improve compute efficiency and performance given limited resources, achieving comparable quality against other industry-level models. To

Liyuan Zhu (@liyuan_zz) 's Twitter Profile Photo

🔔 [SIGGRAPH ’25] Want to redesign your apartment and control the style of every piece of furniture? (virtual try-on for 3D scenes). 🎨 Introducing ReStyle3D, a method that transforms your apartment into the design styles as you want! #stylization #interiordesign

Yunzhi Zhang (@zhang_yunzhi) 's Twitter Profile Photo

(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.