Sanjay Subramanian (@sanjayssub) Twitter Tweets • TwiCopy

Lea Müller

a year ago

- Humans and Structure from Motion - We jointly reconstruct 3D humans, scene point cloud, and cameras from images captured with sparse uncalibrated cameras. ✨Enjoy reading & happy holidays✨ Project page: muelea.github.io/hsfm

thumb_up_off_alt496

chat_bubble_outline6

repeat65

shareShare

Jiaxin Ge

@aomaru_21490

10 months ago

Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, high-quality presentation slides from scratch! 📄 arxiv.org/abs/2501.00912 🤗 huggingface.co/spaces/JiaxinG… 🔗 github.com/para-lost/Auto… Berkeley AI Research Language Technologies Institute | @CarnegieMellon

thumb_up_off_alt174

chat_bubble_outline4

repeat70

shareShare

Eve Fleisig

@enfleisig

8 months ago

How does model calibration stand up against humans? We ran live competitions, comparing model and human calibration, to create GRACE: a new fine-grained calibration benchmark grounded in human performance. What we found was unexpected! 🧵 📄arxiv.org/pdf/2502.19684

thumb_up_off_alt29

chat_bubble_outline1

repeat3

shareShare

Zineng Tang

@zinengtang

8 months ago

We are thrilled to announce TULIP! 🌷 tulip-berkeley.github.io A state of the vision language encoders coupled with generative model for stronger representation learning.

thumb_up_off_alt301

chat_bubble_outline7

repeat68

shareShare

Baifeng

@baifeng_shi

8 months ago

Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we

thumb_up_off_alt971

chat_bubble_outline27

repeat151

shareShare

Kush Hari

@kushtimusprime

7 months ago

NeRFs and Gaussian Splats excel at static 3D modeling but robots work in dynamic, unpredictable environments. POGS (Persistent Object Gaussian Splats) combines semantic, visual, and grouping features that can be queried with language and spatially updated as environments change

thumb_up_off_alt81

chat_bubble_outline2

repeat17

shareShare

Jiayi Pan

@jiayi_pirate

7 months ago

We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning APR lets LMs learn to orchestrate both serial & parallel compute E2E via supervised training + RL — w/ better efficiency and scalability than long CoT on Countdown 🧵 arxiv.org/abs/2504.15466

thumb_up_off_alt332

chat_bubble_outline16

repeat70

shareShare

Nicholas Tomlin

@nickatomlin

6 months ago

The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision

thumb_up_off_alt181

chat_bubble_outline4

repeat30

shareShare

Ruiqi Zhong

@zhongruiqi

6 months ago

Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at OpenAI and societal impact Anthropic Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P

Last day of PhD!

I pioneered using LLMs to explain dataset&model. It's used by interp at <a href="/OpenAI/">OpenAI</a> and societal impact <a href="/AnthropicAI/">Anthropic</a>

Tutorial here. It's a great direction & someone should carry the torch :)

Thesis available, if you wanna read my acknowledgement section=P

thumb_up_off_alt523

chat_bubble_outline27

repeat37

shareShare

Ritwik Gupta 🇺🇦

@ritwik_g

6 months ago

Ever wondered if the way we feed image patches to vision models is the best way? The standard row-by-row scan isn't always optimal! Modern long-sequence transformers can be surprisingly sensitive to patch order. We developed REOrder to find better, task-specific patch sequences.

thumb_up_off_alt59

chat_bubble_outline2

repeat11

shareShare

Yutong Bai

@yutongbai1002

5 months ago

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

thumb_up_off_alt283

chat_bubble_outline17

repeat74

shareShare

Jessy Lin

@realjessylin

4 months ago

User simulators bridge RL with real-world interaction // jessylin.com/2025/07/10/use… How do we get the RL paradigm to work on tasks beyond math & code? Instead of designing datasets, RL requires designing environments. Given that most non-trivial real-world tasks involve

thumb_up_off_alt330

chat_bubble_outline9

repeat44

shareShare

Baifeng

@baifeng_shi

4 months ago

Understanding a video involves both short-range and long-range understanding. Short-range understanding is more about "motion" and requires system-1 perception. Long-range understanding is more system-2, and requires memory, reasoning, etc. Both have huge room for improvement.

thumb_up_off_alt77

chat_bubble_outline1

repeat12

shareShare

Ruilong Li

@ruilong_li

4 months ago

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/

thumb_up_off_alt410

chat_bubble_outline7

repeat75

shareShare

Brent Yi

@brenthyi

4 months ago

Had so much fun working on this😊 PyTorch and JAX implementations are both out!

thumb_up_off_alt66

chat_bubble_outline0

repeat7

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

4 months ago

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

thumb_up_off_alt458

chat_bubble_outline15

repeat87

shareShare