Jiteng Mu (@jitengmu) Twitter Tweets • TwiCopy

Minguk_Kang

a year ago

We're excited to introduce our new 1-step image generator, Diffusion2GAN at #ECCV2024, which enables ODE-preserving 1k image generation in just 0.16 seconds! Check out our #ECCV2024 paper mingukkang.github.io/Diffusion2GAN/ and stop by poster #181 (Wed Oct 2, 10:30-12:30 CEST) if you're

thumb_up_off_alt280

chat_bubble_outline6

repeat50

shareShare

Jiteng Mu

@jitengmu

a year ago

Precise spatial image editing with diffusion models? We will be presenting #ECCV2024 Editable Image Elements (Thu Oct 3, 16:30-18:30 CEST, poster #262). Please come check out our poster and say hi😃! w/ Michaël Gharbi,Richard Zhang,Eli Shechtman,Nuno Vasconcelos,Xiaolong Wang,Taesung Park.

thumb_up_off_alt59

chat_bubble_outline0

repeat10

shareShare

An-Chieh Cheng

@anjjei

a year ago

Human videos are scaling up vision-language navigation! 🚀 It has been a rewarding journey to tame the VLM into the VLA model and enable it to learn from real-world human touring videos🚶🏽. Watching robots move through all kinds of places, following our instructions, and doing

thumb_up_off_alt58

chat_bubble_outline0

repeat19

shareShare

Xiaolong Wang

@xiaolonw

a year ago

Autoregressive models are picking up for image generation, but how about image editing? Given an image and a language description, we train one Autoregressive Transformer to do ANY EDITING.

thumb_up_off_alt85

chat_bubble_outline7

repeat12

shareShare

Yinbo Chen

@yinbochen

10 months ago

Introducing “Diffusion Autoencoders are Scalable Image Tokenizers” (DiTo). We show that with proper designs and scaling up, diffusion autoencoders (a single L2 loss) can outperform the GAN-LPIPS tokenizers (hybrid losses) used in current SOTA generative models. (1/4)

thumb_up_off_alt510

chat_bubble_outline4

repeat104

shareShare

Isabella Liu

@isabella__liu

9 months ago

🐅 Want to rig your favorite meme character? Try “RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets”! ✨RigAnything is a transformer-based model that sequentially generates skeletons without predefined templates. It creates high-quality skeletons for

thumb_up_off_alt1,1K

chat_bubble_outline21

repeat229

shareShare

Jiteng Mu

@jitengmu

9 months ago

Glad to share EditAR has been accepted to #CVPR2025 2025 ! jitengmu.github.io/EditAR/

thumb_up_off_alt54

chat_bubble_outline3

repeat16

shareShare

Yuzhe Qin

@qinyuzhe

9 months ago

Meet our first general-purpose robot at Dexmate dexmate.ai/vega Adjustable height from 0.66m to 2.2m: compact enough for an SUV, tall enough to reach those impossible high shelves. Powerful dual arms (15lbs payload each) and omni-directional mobility for ultimate

thumb_up_off_alt207

chat_bubble_outline13

repeat33

shareShare

Michaël Gharbi

@m_gharbi

8 months ago

Today's visual generative models are mere stochastic parrots of imagery, much like early language models, which could only statistically mimic short sentences with little reasoning. In contrast, modern large language models (LLMs) can comprehend long documents, keep track of

thumb_up_off_alt212

chat_bubble_outline30

repeat31

shareShare

Taesung Park

@taesung

8 months ago

Excited to come out of stealth at Reve! Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)

Excited to come out of stealth at <a href="/reveimage/">Reve</a>!
Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)

thumb_up_off_alt846

chat_bubble_outline43

repeat88

shareShare

Xiaolong Wang

@xiaolonw

8 months ago

Test-Time Training (TTT) is now on Video! And not just a 5-second video. We can generate a full 1-min video! TTT module is an RNN module that provides an explicit and efficient memory mechanism. It models the hidden state of an RNN with a machine learning model, which is updated

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat181

shareShare

Jiteng Mu

@jitengmu

5 months ago

🥳 EditAR code is released! Welcome to check it out. 👉Presenting EditAR at #CVPR2025! (Friday afternoon, Jun 13, 4:00pm-6:00pm, Hall D #242) Code: github.com/JitengMu/EditAR Project: jitengmu.github.io/EditAR

thumb_up_off_alt32

chat_bubble_outline1

repeat4

shareShare

Jiteng Mu

@jitengmu

5 months ago

It's challenging, but so rewarding! Thank you, Xiaolong Wang, 🥰 for being a steady source of support and mentorship. I am especially grateful for the freedom you gave me to follow my curiosity. I also feel lucky to have shared this journey with such an inspiring group of labmates!

thumb_up_off_alt24

chat_bubble_outline4

repeat0

shareShare

Jianglong Ye

@jianglong_ye

5 months ago

How to generate billion-scale manipulation demonstrations easily? Let us leverage generative models! 🤖✨ We introduce Dex1B, a framework that generates 1 BILLION diverse dexterous hand demonstrations for both grasping 🖐️and articulation 💻 tasks using a simple C-VAE model.

thumb_up_off_alt350

chat_bubble_outline14

repeat80

shareShare