Jun-Yan Zhu (@junyanz89) Twitter Tweets • TwiCopy

Gaurav Parmar

10 months ago

[1/4] Ever wondered what it would be like to use images—rather than text—to generate object and background compositions? We introduce VisualComposer, a method for compositional image generation with object-level visual prompts.

thumb_up_off_alt152

chat_bubble_outline1

repeat26

shareShare

Kfir Aberman

@abermankfir

10 months ago

Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts! snap-research.github.io/visual-compose…

thumb_up_off_alt43

chat_bubble_outline0

repeat10

shareShare

Tsai-Shien Chen

@tsaishien_chen

10 months ago

Introducing ⚗️ Video Alchemist Our new video model supporting 👪 Multi-subject open-set personalization 🏞️ Foreground & background personalization 🚀 Without the need of inference-time tuning snap-research.github.io/open-set-video… [Results] 1. Sora girl rides a dinosaur on a savanna 🧵👇

thumb_up_off_alt231

chat_bubble_outline2

repeat38

shareShare

Muyang Li

@lmxyy1999

10 months ago

🚀In my last project, I developed a simple interactive WebUI tool, #VisCompare, to compare images/videos side-by-side across different models and methods as in the video. 🌟It's now open-source at github.com/mit-han-lab/Vi…! 🙌Hope it can benefit the community—feedback and

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Joy Hsu

@joycjhsu

9 months ago

Excited to bring back the 2nd Workshop on Visual Concepts at #CVPR2025 2025, this time with a call for papers! We welcome submissions on the following topics. See our website for more info: sites.google.com/stanford.edu/w… Join us & a fantastic lineup of speakers in Tennessee!

Excited to bring back the 2nd Workshop on Visual Concepts at <a href="/CVPR/">#CVPR2025</a> 2025, this time with a call for papers!

We welcome submissions on the following topics. See our website for more info:
sites.google.com/stanford.edu/w…

Join us & a fantastic lineup of speakers in Tennessee!

thumb_up_off_alt135

chat_bubble_outline1

repeat23

shareShare

Song Han

@songhan_mit

9 months ago

Explore SVDQuant, it's time for 4bit inference: forbes.com/sites/johnwern…

thumb_up_off_alt33

chat_bubble_outline1

repeat5

shareShare

Nupur Kumari

@nupurkmr9

9 months ago

Can we generate a training dataset of the same object in different contexts for customization? Check out our work SynCD, which uses Objaverse assets and shared attention in text-to-image models for the same. cs.cmu.edu/~syncd-project/ w/ Xi Yin Jun-Yan Zhu Ishan Misra Samaneh Azadi

thumb_up_off_alt62

chat_bubble_outline0

repeat12

shareShare

Jia-Bin Huang

@jbhuang0604

9 months ago

Holy Crap!! The journal extension of Expressive Image Generation with Rich Text has been accepted to IJCV! This extension expands the capability of rich text by enabling hyperlinks, texture fill, semantic image editing, and a new benchmark (yay, table with numbers)! Congrats

thumb_up_off_alt78

chat_bubble_outline2

repeat3

shareShare

Nupur Kumari

@nupurkmr9

8 months ago

Check out our @gradio demo based on Black Forest Labs's FLUX model!! We fine-tune the model using our generated dataset to achieve tuning-free customization on new reference objects. huggingface.co/spaces/nupurkm…

thumb_up_off_alt38

chat_bubble_outline0

repeat13

shareShare

Tinghui Zhou

@tinghuizhou

7 months ago

We shared some early work towards a multi-modal and multi-task 3D foundation model at Roblox. First release is a discrete shape tokenizer compatible with autoregressive modeling for text-to-shape. More to come soon Github: github.com/Roblox/cube Arxiv: arxiv.org/abs/2503.15475

thumb_up_off_alt102

chat_bubble_outline1

repeat22

shareShare

Michaël Gharbi

@m_gharbi

7 months ago

Today's visual generative models are mere stochastic parrots of imagery, much like early language models, which could only statistically mimic short sentences with little reasoning. In contrast, modern large language models (LLMs) can comprehend long documents, keep track of

thumb_up_off_alt212

chat_bubble_outline30

repeat31

shareShare

Taesung Park

@taesung

7 months ago

Excited to come out of stealth at Reve! Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)

Excited to come out of stealth at <a href="/reveimage/">Reve</a>!
Today's text-to-image/video models, in contrast to LLMs, lack logic. Images seem plausible initially but fall apart under scrutiny: painting techniques don't match, props don't carry meaning, and compositions lack intention. (1/4)

thumb_up_off_alt846

chat_bubble_outline43

repeat88

shareShare

Nicole Feng

@nicolefeng_

7 months ago

I've updated my blog post to walk through the remaining technical details of our Surface Winding Numbers algorithm: now the calculus of the algorithm is explained a bit more in detail. The post, paper, code, etc. is all here: nzfeng.github.io/research/WNoDS…

thumb_up_off_alt438

chat_bubble_outline6

repeat65

shareShare

Jun-Yan Zhu

@junyanz89

7 months ago

Hi there, Phillip Isola and I wrote a short article (500 words) on Generative Modeling for the Open Encyclopedia of Cognitive Science. We briefly discuss the basic concepts of generative models and their applications. Don't miss out Phillip Isola's hand-drawn cats in Figure 1!

thumb_up_off_alt96

chat_bubble_outline1

repeat16

shareShare

Muyang Li

@lmxyy1999

6 months ago

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time) 📍 Where: Hall 3 + Hall 2B, Poster 169 📌 Poster: tinyurl.com/poster-svdquant 🎮 Demo:

thumb_up_off_alt33

chat_bubble_outline0

repeat7

shareShare

San Antonio Spurs

@spurs

6 months ago

Thank you, Coach Pop, for your brilliance on and off the court. We look forward to our next chapter together.

thumb_up_off_alt10,10K

chat_bubble_outline159

repeat3,3K

shareShare

Percy Liang

@percyliang

5 months ago

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

thumb_up_off_alt3,3K

chat_bubble_outline31

repeat323

shareShare

Muyang Li

@lmxyy1999

4 months ago

🚀 #Nunchaku now supports FLUX.1-Kontext-dev! Edit images with just one sentence — style transfer, face swap, and more — now 2–3× faster and using 1/4 VRAM. ✅ Works with ComfyUI & Diffusers 🔗 Demo: svdquant.mit.edu/kontext/ 📂 Code: github.com/mit-han-lab/nu… 🤗 4-bit #SVDQuant

thumb_up_off_alt26

chat_bubble_outline0

repeat3

shareShare