Stephanie Fu (@xkungfu) Twitter Tweets • TwiCopy

Yael Vinker🎗

a year ago

Excited to introduce SketchAgent!👩‍🎨 We leverage the prior of pretrained multimodal LLMs for language-driven, sequential sketch generation and human-agent collaborative sketching! ✨ Try our fun interface here: github.com/yael-vinker/Sk…

thumb_up_off_alt222

chat_bubble_outline4

repeat47

shareShare

Amir Bar

@_amirbar

a year ago

Happy to share our new work on Navigation World Models! 🔥🔥 Navigation is a fundamental skill of agents with visual-motor capabilities. We train a single World Model across multiple environments and diverse agent data. w/ Gaoyue Zhou, Danny Tran, trevordarrell and Yann LeCun.

thumb_up_off_alt275

chat_bubble_outline5

repeat60

shareShare

Stephanie Fu

@xkungfu

a year ago

Come by our poster #1302 Thursday morning at #NeurIPS2024 to chat!

thumb_up_off_alt16

chat_bubble_outline0

repeat1

shareShare

tyler bonnen

@tylerraye

a year ago

i'm in vancouver for #NeurIPS2024 presenting our 3D shape inference benchmark tomorrow! stop by poster #1210 at 4:30 on friday if you're interested and if you'd like to talk about neuro-ai, human cognition, or suggest nearby hikes, feel free to reach out!

thumb_up_off_alt64

chat_bubble_outline0

repeat6

shareShare

Daniel Geng

@dangengdg

a year ago

I'll be presenting "Images that Sound" today at #NeurIPS2024! East Exhibit Hall A-C #2710. Come say hi to me and Andrew Owens :) (Ziyang Chen sadly could not make it, but will be there in spirit :') )

thumb_up_off_alt55

chat_bubble_outline0

repeat8

shareShare

Jiao Sun

@sunjiao123sun_

a year ago

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a>

We have ethical reviews for authors, but missed it for invited speakers? 😡

thumb_up_off_alt3,3K

chat_bubble_outline184

repeat837

shareShare

Stephanie Fu

@xkungfu

a year ago

unfortunately kansas has snapped out of its identity crisis as a subtropical region and has stopped selling pomegranates en masse this year. RIP to my post-NeurIPS pomegranate binges (2020-2023) gone but not forgotten

thumb_up_off_alt17

chat_bubble_outline0

repeat0

shareShare

Shobhita Sundaram

@shobsund

10 months ago

Personal vision tasks–like detecting *your mug*-are hard; they’re data scarce and fine-grained. In our new paper, we show you can adapt general-purpose vision models to these tasks from just three photos! 📝: arxiv.org/abs/2412.16156 💻: github.com/ssundaram21/pe… (1/n)

thumb_up_off_alt322

chat_bubble_outline5

repeat62

shareShare

Akarsh Kumar

@akarshkumar0101

10 months ago

Very excited to share ASAL! Artificial Life aims to recreate natural evolution, but is severely bottlenecked by hand-designed simulations. We propose using CLIP to automatically discover the interesting ALife simulations!

thumb_up_off_alt235

chat_bubble_outline7

repeat28

shareShare

Siqi Chen

@blader

10 months ago

if you could press a button that cures your child’s brain tumor in exchange for ending your life immediately, every parent would hesitate for zero seconds before fighting to be the first to press it the cruelest thing is that no such button exists. but there is always a move 👇

thumb_up_off_alt20,20K

chat_bubble_outline585

repeat2,2K

shareShare

Ritwik Gupta 🇺🇦

@ritwik_g

8 months ago

Do LLMs understand probability distributions? Can they serve as effective simulators of probability? No! However, in our latest paper that via in-context learning, LLMs update their broken priors in a manner akin to Bayseian updating. 📝 arxiv.org/abs/2503.04722

thumb_up_off_alt163

chat_bubble_outline5

repeat33

shareShare

James Burgess (at ICLR 2025)

@jmhb0

8 months ago

🚨Large video-language models LLaVA-Video can do single-video tasks. But can they compare videos? Imagine you’re learning a sports skill like kicking: can an AI tell how your kick differs from an expert video? 🚀 Introducing "Video Action Differencing" (VidDiff), ICLR 2025 🧵

thumb_up_off_alt57

chat_bubble_outline7

repeat51

shareShare

Baifeng

@baifeng_shi

7 months ago

Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we

thumb_up_off_alt971

chat_bubble_outline27

repeat151

shareShare

Tsung-Han (Patrick) Wu @ ICLR’25

@tsunghan_wu

7 months ago

Thrilled to share the first-ever search leaderboard with lmarena.ai! It's so fun to see how models behave differently — OpenAI loves news (but not YT), Perplexity favors YouTube, and Gemini (Google DeepMind) leans on blogs/forums. More insights: blog.lmarena.ai/blog/2025/sear…

Thrilled to share the first-ever search leaderboard with <a href="/lmarena_ai/">lmarena.ai</a>! It's so fun to see how models behave differently — <a href="/OpenAI/">OpenAI</a> loves news (but not YT), <a href="/perplexity_ai/">Perplexity</a> favors YouTube, and Gemini (<a href="/GoogleDeepMind/">Google DeepMind</a>) leans on blogs/forums. More insights: blog.lmarena.ai/blog/2025/sear…

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Ritwik Gupta 🇺🇦

@ritwik_g

5 months ago

Ever wondered if the way we feed image patches to vision models is the best way? The standard row-by-row scan isn't always optimal! Modern long-sequence transformers can be surprisingly sensitive to patch order. We developed REOrder to find better, task-specific patch sequences.

thumb_up_off_alt59

chat_bubble_outline2

repeat11

shareShare

Netanel Yakir Tamir

@netanel_tamir

5 months ago

What Makes for a Good Stereoscopic Image? Stereoscopic images, or stereo images, consist of two slightly horizontally shifted views, which when viewed together create the perception of a 3D scene. 1/n

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

David Chan

@_dmchan

5 months ago

🚀 Call for Papers! 🚀 Excited to help organize the 4th Workshop on What is Next in Multimodal Foundation Models? at ICCV in Honolulu, Hawai'i 🌺 Submit work on vision, language, audio & more! 🗓️ Deadline: July 1, 2025 🔗 sites.google.com/view/mmfm4thwo… #MMFM4 #ICCV2025 #AI #multimodal

thumb_up_off_alt28

chat_bubble_outline1

repeat7

shareShare

Amil Dravid

@_amildravid

5 months ago

Artifacts in your attention maps? Forgot to train with registers? Use 𝙩𝙚𝙨𝙩-𝙩𝙞𝙢𝙚 𝙧𝙚𝙜𝙞𝙨𝙩𝙚𝙧𝙨! We find a sparse set of activations set artifact positions. We can shift them anywhere ("Shifted") — even outside the image into an untrained token. Clean maps, no retrain.

thumb_up_off_alt325

chat_bubble_outline4

repeat62

shareShare

Yutong Bai

@yutongbai1002

4 months ago

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

thumb_up_off_alt283

chat_bubble_outline17

repeat74

shareShare

Konpat Ta Preechakul

@phizaz

4 months ago

Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the

thumb_up_off_alt359

chat_bubble_outline17

repeat63

shareShare