Sanjay Subramanian (@sanjayssub) 's Twitter Profile
Sanjay Subramanian

@sanjayssub

Building/analyzing NLP and vision models. PhD student @berkeley_ai. Formerly: @allen_ai, @penn

ID: 1176913670057545729

linkhttps://people.eecs.berkeley.edu/~sanjayss/ calendar_today25-09-2019 17:37:49

254 Tweet

889 Followers

560 Following

Lea Müller (@leamue27) 's Twitter Profile Photo

- Humans and Structure from Motion - We jointly reconstruct 3D humans, scene point cloud, and cameras from images captured with sparse uncalibrated cameras. ✨Enjoy reading & happy holidays✨ Project page: muelea.github.io/hsfm

- Humans and Structure from Motion -

We jointly reconstruct 3D humans, scene point cloud, and cameras from images captured with sparse uncalibrated cameras.

✨Enjoy reading & happy holidays✨

Project page: muelea.github.io/hsfm
Jiaxin Ge (@aomaru_21490) 's Twitter Profile Photo

Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, high-quality presentation slides from scratch! 📄 arxiv.org/abs/2501.00912 🤗 huggingface.co/spaces/JiaxinG… 🔗 github.com/para-lost/Auto… Berkeley AI Research Language Technologies Institute | @CarnegieMellon

Eve Fleisig (@enfleisig) 's Twitter Profile Photo

How does model calibration stand up against humans? We ran live competitions, comparing model and human calibration, to create GRACE: a new fine-grained calibration benchmark grounded in human performance. What we found was unexpected! 🧵 📄arxiv.org/pdf/2502.19684

Zineng Tang (@zinengtang) 's Twitter Profile Photo

We are thrilled to announce TULIP! 🌷 tulip-berkeley.github.io A state of the vision language encoders coupled with generative model for stronger representation learning.

We are thrilled to announce TULIP!

🌷 tulip-berkeley.github.io

A state of the vision language encoders coupled with generative model for stronger representation learning.
Baifeng (@baifeng_shi) 's Twitter Profile Photo

Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we

Next-gen vision pre-trained models shouldn’t be short-sighted.

Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage.

Today, we
Kush Hari (@kushtimusprime) 's Twitter Profile Photo

NeRFs and Gaussian Splats excel at static 3D modeling but robots work in dynamic, unpredictable environments. POGS (Persistent Object Gaussian Splats) combines semantic, visual, and grouping features that can be queried with language and spatially updated as environments change

Jiayi Pan (@jiayi_pirate) 's Twitter Profile Photo

We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning APR lets LMs learn to orchestrate both serial & parallel compute E2E via supervised training + RL — w/ better efficiency and scalability than long CoT on Countdown 🧵 arxiv.org/abs/2504.15466

We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning

APR lets LMs learn to orchestrate both serial & parallel compute E2E via supervised training + RL — w/ better efficiency and scalability than long CoT on Countdown

🧵 arxiv.org/abs/2504.15466
Nicholas Tomlin (@nickatomlin) 's Twitter Profile Photo

The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision

The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision
Ruiqi Zhong (@zhongruiqi) 's Twitter Profile Photo

Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at OpenAI and societal impact Anthropic Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P

Last day of PhD! 

I pioneered using LLMs to explain dataset&amp;model. It's used by interp at <a href="/OpenAI/">OpenAI</a>  and societal impact <a href="/AnthropicAI/">Anthropic</a> 

Tutorial here. It's a great direction &amp; someone should carry the torch :)

Thesis available, if you wanna read my acknowledgement section=P
Ritwik Gupta 🇺🇦 (@ritwik_g) 's Twitter Profile Photo

Ever wondered if the way we feed image patches to vision models is the best way? The standard row-by-row scan isn't always optimal! Modern long-sequence transformers can be surprisingly sensitive to patch order. We developed REOrder to find better, task-specific patch sequences.

Yutong Bai (@yutongbai1002) 's Twitter Profile Photo

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

Jessy Lin (@realjessylin) 's Twitter Profile Photo

User simulators bridge RL with real-world interaction // jessylin.com/2025/07/10/use… How do we get the RL paradigm to work on tasks beyond math & code? Instead of designing datasets, RL requires designing environments. Given that most non-trivial real-world tasks involve

User simulators bridge RL with real-world interaction //

jessylin.com/2025/07/10/use…

How do we get the RL paradigm to work on tasks beyond math &amp; code? Instead of designing datasets, RL requires designing environments. Given that most non-trivial real-world tasks involve
Baifeng (@baifeng_shi) 's Twitter Profile Photo

Understanding a video involves both short-range and long-range understanding. Short-range understanding is more about "motion" and requires system-1 perception. Long-range understanding is more system-2, and requires memory, reasoning, etc. Both have huge room for improvement.

Ruilong Li (@ruilong_li) 's Twitter Profile Photo

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc]

Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! 

Paper &amp; code: liruilong.cn/prope/
Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

How does prompt optimization compare to RL algos like GRPO?

GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked &amp; what didn't.

Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵