Stephanie Fu (@xkungfu) 's Twitter Profile
Stephanie Fu

@xkungfu

PhD student @berkeley_ai | studying computer vision and intelligence | Previous: CS + Music @MIT

ID: 4870958294

linkhttps://stephanie-fu.github.io/ calendar_today07-02-2016 00:06:00

330 Tweet

576 Followers

228 Following

Yael Vinker๐ŸŽ— (@yvinker) 's Twitter Profile Photo

Excited to introduce SketchAgent!๐Ÿ‘ฉโ€๐ŸŽจ We leverage the prior of pretrained multimodal LLMs for language-driven, sequential sketch generation and human-agent collaborative sketching! โœจ Try our fun interface here: github.com/yael-vinker/Skโ€ฆ

Amir Bar (@_amirbar) 's Twitter Profile Photo

Happy to share our new work on Navigation World Models! ๐Ÿ”ฅ๐Ÿ”ฅ Navigation is a fundamental skill of agents with visual-motor capabilities. We train a single World Model across multiple environments and diverse agent data. w/ Gaoyue Zhou, Danny Tran, trevordarrell and Yann LeCun.

tyler bonnen (@tylerraye) 's Twitter Profile Photo

i'm in vancouver for #NeurIPS2024 presenting our 3D shape inference benchmark tomorrow! stop by poster #1210 at 4:30 on friday if you're interested and if you'd like to talk about neuro-ai, human cognition, or suggest nearby hikes, feel free to reach out!

Daniel Geng (@dangengdg) 's Twitter Profile Photo

I'll be presenting "Images that Sound" today at #NeurIPS2024! East Exhibit Hall A-C #2710. Come say hi to me and Andrew Owens :) (Ziyang Chen sadly could not make it, but will be there in spirit :') )

Jiao Sun (@sunjiao123sun_) 's Twitter Profile Photo

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Canโ€™t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? ๐Ÿ˜ก

Mitigating racial bias from LLMs is a lot easier than removing it from humans! 

Canโ€™t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a> 

We have ethical reviews for authors, but missed it for invited speakers? ๐Ÿ˜ก
Stephanie Fu (@xkungfu) 's Twitter Profile Photo

unfortunately kansas has snapped out of its identity crisis as a subtropical region and has stopped selling pomegranates en masse this year. RIP to my post-NeurIPS pomegranate binges (2020-2023) gone but not forgotten

Shobhita Sundaram (@shobsund) 's Twitter Profile Photo

Personal vision tasksโ€“like detecting *your mug*-are hard; theyโ€™re data scarce and fine-grained. In our new paper, we show you can adapt general-purpose vision models to these tasks from just three photos! ๐Ÿ“: arxiv.org/abs/2412.16156 ๐Ÿ’ป: github.com/ssundaram21/peโ€ฆ (1/n)

Personal vision tasksโ€“like detecting *your mug*-are hard; theyโ€™re data scarce and fine-grained. 

In our new paper, we show you can adapt general-purpose vision models to these tasks from just three photos!

๐Ÿ“: arxiv.org/abs/2412.16156
๐Ÿ’ป: github.com/ssundaram21/peโ€ฆ

(1/n)
Akarsh Kumar (@akarshkumar0101) 's Twitter Profile Photo

Very excited to share ASAL! Artificial Life aims to recreate natural evolution, but is severely bottlenecked by hand-designed simulations. We propose using CLIP to automatically discover the interesting ALife simulations!

Very excited to share ASAL! 
Artificial Life aims to recreate natural evolution, but is severely bottlenecked by hand-designed simulations.
We propose using CLIP to automatically discover the interesting ALife simulations!
Siqi Chen (@blader) 's Twitter Profile Photo

if you could press a button that cures your childโ€™s brain tumor in exchange for ending your life immediately, every parent would hesitate for zero seconds before fighting to be the first to press it the cruelest thing is that no such button exists. but there is always a move ๐Ÿ‘‡

if you could press a button that cures your childโ€™s brain tumor in exchange for ending your life immediately, every parent would hesitate for zero seconds before fighting to be the first to press it

the cruelest thing is that no such button exists.

but there is always a move ๐Ÿ‘‡
Ritwik Gupta ๐Ÿ‡บ๐Ÿ‡ฆ (@ritwik_g) 's Twitter Profile Photo

Do LLMs understand probability distributions? Can they serve as effective simulators of probability? No! However, in our latest paper that via in-context learning, LLMs update their broken priors in a manner akin to Bayseian updating. ๐Ÿ“ arxiv.org/abs/2503.04722

James Burgess (at ICLR 2025) (@jmhb0) 's Twitter Profile Photo

๐ŸšจLarge video-language models LLaVA-Video can do single-video tasks. But can they compare videos? Imagine youโ€™re learning a sports skill like kicking: can an AI tell how your kick differs from an expert video? ๐Ÿš€ Introducing "Video Action Differencing" (VidDiff), ICLR 2025 ๐Ÿงต

Baifeng (@baifeng_shi) 's Twitter Profile Photo

Next-gen vision pre-trained models shouldnโ€™t be short-sighted. Humans can easily perceive 10K x 10K resolution. But todayโ€™s top vision modelsโ€”like SigLIP and DINOv2โ€”are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we

Next-gen vision pre-trained models shouldnโ€™t be short-sighted.

Humans can easily perceive 10K x 10K resolution. But todayโ€™s top vision modelsโ€”like SigLIP and DINOv2โ€”are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage.

Today, we
Tsung-Han (Patrick) Wu @ ICLRโ€™25 (@tsunghan_wu) 's Twitter Profile Photo

Thrilled to share the first-ever search leaderboard with lmarena.ai! It's so fun to see how models behave differently โ€” OpenAI loves news (but not YT), Perplexity favors YouTube, and Gemini (Google DeepMind) leans on blogs/forums. More insights: blog.lmarena.ai/blog/2025/searโ€ฆ

Thrilled to share the first-ever search leaderboard with <a href="/lmarena_ai/">lmarena.ai</a>! It's so fun to see how models behave differently โ€” <a href="/OpenAI/">OpenAI</a>  loves news (but not YT), <a href="/perplexity_ai/">Perplexity</a> favors YouTube, and Gemini (<a href="/GoogleDeepMind/">Google DeepMind</a>)  leans on blogs/forums. More insights: blog.lmarena.ai/blog/2025/searโ€ฆ
Ritwik Gupta ๐Ÿ‡บ๐Ÿ‡ฆ (@ritwik_g) 's Twitter Profile Photo

Ever wondered if the way we feed image patches to vision models is the best way? The standard row-by-row scan isn't always optimal! Modern long-sequence transformers can be surprisingly sensitive to patch order. We developed REOrder to find better, task-specific patch sequences.

David Chan (@_dmchan) 's Twitter Profile Photo

๐Ÿš€ Call for Papers! ๐Ÿš€ Excited to help organize the 4th Workshop on What is Next in Multimodal Foundation Models? at ICCV in Honolulu, Hawai'i ๐ŸŒบ Submit work on vision, language, audio & more! ๐Ÿ—“๏ธ Deadline: July 1, 2025 ๐Ÿ”— sites.google.com/view/mmfm4thwoโ€ฆ #MMFM4 #ICCV2025 #AI #multimodal

Amil Dravid (@_amildravid) 's Twitter Profile Photo

Artifacts in your attention maps? Forgot to train with registers? Use ๐™ฉ๐™š๐™จ๐™ฉ-๐™ฉ๐™ž๐™ข๐™š ๐™ง๐™š๐™œ๐™ž๐™จ๐™ฉ๐™š๐™ง๐™จ! We find a sparse set of activations set artifact positions. We can shift them anywhere ("Shifted") โ€” even outside the image into an untrained token. Clean maps, no retrain.

Artifacts in your attention maps? Forgot to train with registers? Use ๐™ฉ๐™š๐™จ๐™ฉ-๐™ฉ๐™ž๐™ข๐™š ๐™ง๐™š๐™œ๐™ž๐™จ๐™ฉ๐™š๐™ง๐™จ! We find a sparse set of activations set artifact positions. We can shift them anywhere ("Shifted") โ€” even outside the image into an untrained token. Clean maps, no retrain.
Yutong Bai (@yutongbai1002) 's Twitter Profile Photo

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action spaceโ€”not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

Konpat Ta Preechakul (@phizaz) 's Twitter Profile Photo

Some problems canโ€™t be rushedโ€”they can only be done step by step, no matter how many people or processors you throw at them. Weโ€™ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the