Jun-Yan Zhu (@junyanz89) 's Twitter Profile
Jun-Yan Zhu

@junyanz89

Assistant Professor at Generative Intelligence Lab @CMU_Robotics @CarnegieMellon. Understanding and creating pixels.

ID: 852066463208931328

linkhttps://www.cs.cmu.edu/~junyanz/ calendar_today12-04-2017 07:50:50

325 Tweet

10,10K Followers

655 Following

Gaurav Parmar (@gauravtparmar) 's Twitter Profile Photo

[1/4] Ever wondered what it would be like to use images—rather than text—to generate object and background compositions? We introduce VisualComposer, a method for compositional image generation with object-level visual prompts.

Kfir Aberman (@abermankfir) 's Twitter Profile Photo

Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts! snap-research.github.io/visual-compose…

Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? 

We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts!

snap-research.github.io/visual-compose…
Tsai-Shien Chen (@tsaishien_chen) 's Twitter Profile Photo

Introducing ⚗️ Video Alchemist Our new video model supporting 👪 Multi-subject open-set personalization 🏞️ Foreground & background personalization 🚀 Without the need of inference-time tuning snap-research.github.io/open-set-video… [Results] 1. Sora girl rides a dinosaur on a savanna 🧵👇

Muyang Li (@lmxyy1999) 's Twitter Profile Photo

🚀In my last project, I developed a simple interactive WebUI tool, #VisCompare, to compare images/videos side-by-side across different models and methods as in the video. 🌟It's now open-source at github.com/mit-han-lab/Vi…! 🙌Hope it can benefit the community—feedback and

Joy Hsu (@joycjhsu) 's Twitter Profile Photo

Excited to bring back the 2nd Workshop on Visual Concepts at #CVPR2025 2025, this time with a call for papers! We welcome submissions on the following topics. See our website for more info: sites.google.com/stanford.edu/w… Join us & a fantastic lineup of speakers in Tennessee!

Excited to bring back the 2nd Workshop on Visual Concepts at <a href="/CVPR/">#CVPR2025</a> 2025, this time with a call for papers!

We welcome submissions on the following topics. See our website for more info:
sites.google.com/stanford.edu/w…

Join us &amp; a fantastic lineup of speakers in Tennessee!
Nupur Kumari (@nupurkmr9) 's Twitter Profile Photo

Can we generate a training dataset of the same object in different contexts for customization? Check out our work SynCD, which uses Objaverse assets and shared attention in text-to-image models for the same. cs.cmu.edu/~syncd-project/ w/ Xi Yin Jun-Yan Zhu Ishan Misra Samaneh Azadi

Jia-Bin Huang (@jbhuang0604) 's Twitter Profile Photo

Holy Crap!! The journal extension of Expressive Image Generation with Rich Text has been accepted to IJCV! This extension expands the capability of rich text by enabling hyperlinks, texture fill, semantic image editing, and a new benchmark (yay, table with numbers)! Congrats

Holy Crap!! 

The journal extension of Expressive Image Generation with Rich Text has been accepted to IJCV!

This extension expands the capability of rich text by enabling hyperlinks, texture fill, semantic image editing, and a new benchmark (yay, table with numbers)! 

Congrats
Nupur Kumari (@nupurkmr9) 's Twitter Profile Photo

Check out our @gradio demo based on Black Forest Labs's FLUX model!! We fine-tune the model using our generated dataset to achieve tuning-free customization on new reference objects. huggingface.co/spaces/nupurkm…

Tinghui Zhou (@tinghuizhou) 's Twitter Profile Photo

We shared some early work towards a multi-modal and multi-task 3D foundation model at Roblox. First release is a discrete shape tokenizer compatible with autoregressive modeling for text-to-shape. More to come soon Github: github.com/Roblox/cube Arxiv: arxiv.org/abs/2503.15475

Michaël Gharbi (@m_gharbi) 's Twitter Profile Photo

Today's visual generative models are mere stochastic parrots of imagery, much like early language models, which could only statistically mimic short sentences with little reasoning. In contrast, modern large language models (LLMs) can comprehend long documents, keep track of

Taesung Park (@taesung) 's Twitter Profile Photo

Excited to come out of stealth at Reve! Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)

Excited to come out of stealth at <a href="/reveimage/">Reve</a>!
Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)
Nicole Feng (@nicolefeng_) 's Twitter Profile Photo

I've updated my blog post to walk through the remaining technical details of our Surface Winding Numbers algorithm: now the calculus of the algorithm is explained a bit more in detail. The post, paper, code, etc. is all here: nzfeng.github.io/research/WNoDS…

Jun-Yan Zhu (@junyanz89) 's Twitter Profile Photo

Hi there, Phillip Isola and I wrote a short article (500 words) on Generative Modeling for the Open Encyclopedia of Cognitive Science. We briefly discuss the basic concepts of generative models and their applications. Don't miss out Phillip Isola's hand-drawn cats in Figure 1!

Muyang Li (@lmxyy1999) 's Twitter Profile Photo

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time) 📍 Where: Hall 3 + Hall 2B, Poster 169 📌 Poster: tinyurl.com/poster-svdquant 🎮 Demo:

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 
🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time)
📍 Where: Hall 3 + Hall 2B, Poster 169
📌 Poster: tinyurl.com/poster-svdquant
🎮 Demo:
Percy Liang (@percyliang) 's Twitter Profile Photo

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

Muyang Li (@lmxyy1999) 's Twitter Profile Photo

🚀 #Nunchaku now supports FLUX.1-Kontext-dev! Edit images with just one sentence — style transfer, face swap, and more — now 2–3× faster and using 1/4 VRAM. ✅ Works with ComfyUI & Diffusers 🔗 Demo: svdquant.mit.edu/kontext/ 📂 Code: github.com/mit-han-lab/nu… 🤗 4-bit #SVDQuant

🚀 #Nunchaku now supports FLUX.1-Kontext-dev!
Edit images with just one sentence — style transfer, face swap, and more — now 2–3× faster and using 1/4 VRAM.
✅ Works with ComfyUI &amp; Diffusers
🔗 Demo: svdquant.mit.edu/kontext/
📂 Code: github.com/mit-han-lab/nu…
🤗 4-bit #SVDQuant