Jiteng Mu (@jitengmu) 's Twitter Profile
Jiteng Mu

@jitengmu

Ph.D. @UCSD ; previously M.S. @JohnsHopkins ; Intern @Nvidia @Adobe

ID: 973212600946307073

linkhttps://jitengmu.github.io calendar_today12-03-2018 15:02:20

65 Tweet

694 Followers

608 Following

Minguk_Kang (@minguk_kang) 's Twitter Profile Photo

We're excited to introduce our new 1-step image generator, Diffusion2GAN at #ECCV2024, which enables ODE-preserving 1k image generation in just 0.16 seconds! Check out our #ECCV2024 paper mingukkang.github.io/Diffusion2GAN/ and stop by poster #181 (Wed Oct 2, 10:30-12:30 CEST) if you're

Jiteng Mu (@jitengmu) 's Twitter Profile Photo

Precise spatial image editing with diffusion models? We will be presenting #ECCV2024 Editable Image Elements (Thu Oct 3, 16:30-18:30 CEST, poster #262). Please come check out our poster and say hi😃! w/ Michaël Gharbi,Richard Zhang,Eli Shechtman,Nuno Vasconcelos,Xiaolong Wang,Taesung Park.

An-Chieh Cheng (@anjjei) 's Twitter Profile Photo

Human videos are scaling up vision-language navigation! 🚀 It has been a rewarding journey to tame the VLM into the VLA model and enable it to learn from real-world human touring videos🚶🏽. Watching robots move through all kinds of places, following our instructions, and doing

Xiaolong Wang (@xiaolonw) 's Twitter Profile Photo

Autoregressive models are picking up for image generation, but how about image editing? Given an image and a language description, we train one Autoregressive Transformer to do ANY EDITING.

Yinbo Chen (@yinbochen) 's Twitter Profile Photo

Introducing “Diffusion Autoencoders are Scalable Image Tokenizers” (DiTo). We show that with proper designs and scaling up, diffusion autoencoders (a single L2 loss) can outperform the GAN-LPIPS tokenizers (hybrid losses) used in current SOTA generative models. (1/4)

Introducing “Diffusion Autoencoders are Scalable Image Tokenizers” (DiTo).

We show that with proper designs and scaling up, diffusion autoencoders (a single L2 loss) can outperform the GAN-LPIPS tokenizers (hybrid losses) used in current SOTA generative models. (1/4)
Isabella Liu (@isabella__liu) 's Twitter Profile Photo

🐅 Want to rig your favorite meme character? Try “RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets”! ✨RigAnything is a transformer-based model that sequentially generates skeletons without predefined templates. It creates high-quality skeletons for

Yuzhe Qin (@qinyuzhe) 's Twitter Profile Photo

Meet our first general-purpose robot at Dexmate dexmate.ai/vega Adjustable height from 0.66m to 2.2m: compact enough for an SUV, tall enough to reach those impossible high shelves. Powerful dual arms (15lbs payload each) and omni-directional mobility for ultimate

Michaël Gharbi (@m_gharbi) 's Twitter Profile Photo

Today's visual generative models are mere stochastic parrots of imagery, much like early language models, which could only statistically mimic short sentences with little reasoning. In contrast, modern large language models (LLMs) can comprehend long documents, keep track of

Taesung Park (@taesung) 's Twitter Profile Photo

Excited to come out of stealth at Reve! Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)

Excited to come out of stealth at <a href="/reveimage/">Reve</a>!
Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)
Xiaolong Wang (@xiaolonw) 's Twitter Profile Photo

Test-Time Training (TTT) is now on Video! And not just a 5-second video. We can generate a full 1-min video! TTT module is an RNN module that provides an explicit and efficient memory mechanism. It models the hidden state of an RNN with a machine learning model, which is updated

Jiteng Mu (@jitengmu) 's Twitter Profile Photo

🥳 EditAR code is released! Welcome to check it out. 👉Presenting EditAR at #CVPR2025! (Friday afternoon, Jun 13, 4:00pm-6:00pm, Hall D #242) Code: github.com/JitengMu/EditAR Project: jitengmu.github.io/EditAR

Jiteng Mu (@jitengmu) 's Twitter Profile Photo

It's challenging, but so rewarding! Thank you, Xiaolong Wang, 🥰 for being a steady source of support and mentorship. I am especially grateful for the freedom you gave me to follow my curiosity. I also feel lucky to have shared this journey with such an inspiring group of labmates!

Jianglong Ye (@jianglong_ye) 's Twitter Profile Photo

How to generate billion-scale manipulation demonstrations easily? Let us leverage generative models! 🤖✨ We introduce Dex1B, a framework that generates 1 BILLION diverse dexterous hand demonstrations for both grasping 🖐️and articulation 💻 tasks using a simple C-VAE model.