Dorsa Sadigh (@dorsasadigh) 's Twitter Profile
Dorsa Sadigh

@dorsasadigh

CS Faculty @Stanford, @StanfordAILab, @StanfordHAI
Research scientist @GoogleDeepMind
PhD and BS from @Berkeley_EECS

ID: 2364317023

linkhttps://dorsa.fyi/ calendar_today27-02-2014 15:15:08

571 Tweet

10,10K Followers

398 Following

Dorsa Sadigh (@dorsasadigh) 's Twitter Profile Photo

Robot policies succeed not just by mimicking fine-grained actions but by reasoning about each step. It's surprising how much insight reasoning traces hold! RAD (Reasoning through Action-Free Data) learns policies guided by CoT and using higher-level reasoning from human videos.

Dorsa Sadigh (@dorsasadigh) 's Twitter Profile Photo

In large-scale pretraining for robotics -- where massive datasets are lacking -- data quality matters more than ever. We use mutual information estimators to identify high-quality data --optimizing for diverse states & easy-to-fit actions to improve learning policies.

HRI Pioneers (@hripioneers) 's Twitter Profile Photo

Welcome #HRIPioneers2025! Megha Srivastava from Stanford University will present their work 'Robotics for Personalized Motor Skills Instruction' at The HRI Conference Read more on Megha's website: cs.stanford.edu/~megha

Dorsa Sadigh (@dorsasadigh) 's Twitter Profile Photo

While we spend a lot of time thinking about teaching robots how to do tasks, it is worthwhile to think about how to teach humans motor control tasks. Using shared autonomy, we teach race driving skills in driver's zone of proximal development. #HRI2025 πŸ§΅πŸ‘‡

Dorsa Sadigh (@dorsasadigh) 's Twitter Profile Photo

Unified Video Action (UVA) is a single model that leverages joint video-action optimization & decoupled decoding to allow for video generation benefitting policy learning and vice versa. UVA can act as a policy, video generator, inverse dynamics and forward dynamics model. πŸ§΅πŸ‘‡

Dorsa Sadigh (@dorsasadigh) 's Twitter Profile Photo

Gemini Robotics is here! πŸš€ One standout for me: it’s not just great at dexterous tasks (like folding a t-shirt), but we're finally seeing signs of generalization beyond semantic reasoning. That said the model’s instruction following and steerability are seriously impressive!

Dorsa Sadigh (@dorsasadigh) 's Twitter Profile Photo

It is really wonderful to see uncut interactions like this one from Gemini Robotics! The instruction following and steerability of a single VLA is pretty great. It makes me think we can finally seriously work on downstream interactions with robots.

Dorsa Sadigh (@dorsasadigh) 's Twitter Profile Photo

HoMeR brings mobile manipulators out in the wild with an efficient hybrid imitation policy that uses keypoints from a VLM. This allows for generalization to unseen scenarios in particular in near-interaction settings where it is hard to decouple mobility and manipulation!

Vincent de Bakker (@v_debakker) 's Twitter Profile Photo

Can we teach dexterous robot hands manipulation without human demos or hand-crafted rewards? Our key insight: Use Vision-Language Models (VLMs) to scaffold coarse motion plans, then train an RL agent to execute them with 3D keypoints as the interface. 1/7