Karl Pertsch (@karlpertsch) 's Twitter Profile
Karl Pertsch

@karlpertsch

Robot Foundation Models @ UC Berkeley & Stanford & @physical_int | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.

ID: 3377714115

linkhttp://kpertsch.github.io calendar_today15-07-2015 19:46:33

353 Tweet

3,3K Followers

269 Following

Remi Cadene (@remicadene) 's Twitter Profile Photo

⭐ The first foundational model available on LeRobot ⭐ Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior. It was trained by Physical Intelligence and ported to pytorch by Pablo Montalvo 👇🧵

Jie Wang (@jiewang_zjui) 's Twitter Profile Photo

Excited to play with Pi0, it is so cool! We just config DROID, download ckpt & inference code, and it works without any tuning, what a impressive moment

Edward Hu (@edward_s_hu) 's Twitter Profile Photo

Pi0 really did work for us on the first try. No camera calibration, controller tuning, etc. The failure cases: missed grasps and risk-averse "hedging" behavior. Excited to see how the robotics community improves on this. At the very least, it will be a good baseline.

Karl Pertsch (@karlpertsch) 's Twitter Profile Photo

Check out Lucy's project on teaching our robots to be more steerable and interpret instructions like "that's not trash" in the context of the current scene! Big kudos to Lucy Shi who wrangled the full robot learning stack from LL training, VLM/VLA training to human annotation!

Karl Pertsch (@karlpertsch) 's Twitter Profile Photo

Scalable evaluation is a major challenge in robotics research! Check out our AutoEval project, where we try to make reproducible eval more accessible through 24/7 autonomous policy evaluation. Our eval cells are public, so you can submit your Bridge policies for eval today! :)

Irmak Guzey (@irmakkguzey) 's Twitter Profile Photo

Despite great advances in learning dexterity, hardware remains a major bottleneck. Most dexterous hands are either bulky, weak or expensive. I’m thrilled to present the RUKA Hand — a powerful, accessible research tool for dexterous manipulation that overcomes these limitations!

Karl Pertsch (@karlpertsch) 's Twitter Profile Photo

Our VLA policies now generalize to new homes! 🏠🏠🏠 The main takeaway of π-0.5 is that with good tokenization + flex. VLA architecture you can get away with relatively little mobile manip data (~400h) and still get policies that generalize to cleaning unseen kitchens & bedrooms!

Karl Pertsch (@karlpertsch) 's Twitter Profile Photo

Training with discrete FAST action tokenization now powers all of our pre-training in π-0.5! When combined with π-0 style flow matching during post-training we get both, fast training & fast inference :)

Karl Pertsch (@karlpertsch) 's Twitter Profile Photo

Check out Zubair & Vitor's work on improved camera calibration for DROID! 36k diverse episodes with high-quality calibration + multi-view + stereo, should be a great resource for anyone working on 3D vision / spatial understanding for robotics!

Paul Zhou (@zhiyuan_zhou_) 's Twitter Profile Photo

How can we make robot policy evaluation easier, more accessible, and more comparable? Our answer: autonomous 24/7 eval in real AutoEval will be presented by Sergey Levine at the Robot Learning Workshop at #ICLR25 on Sun April 27! Don't miss it! Oral at 2pm Poster at 2:35 - 3:35 pm

Polina Kirichenko (@polkirichenko) 's Twitter Profile Photo

We are hiring a PhD research intern at FAIR w/ Mark Ibrahim Kamalika Chaudhuri to start this summer or Fall! Potential topics: trustworthy and reliable LLMs, multi-modal LLMs and agents, post-training, reasoning, with a focus on open science/sharing our findings in the paper at the end

Paul Zhou (@zhiyuan_zhou_) 's Twitter Profile Photo

Yes! Lets build a network of distributed eval stations together 🦾 With our open sourced framework it now only takes 3-5 hours to set up a new AutoEval station! We have released a detailed step by step guide.

Karl Pertsch (@karlpertsch) 's Twitter Profile Photo

Check out Danny's paper on a single-stage VLA recipe that trains fast, has fast inference, and follows language commands well. ⚡️⚡️⚡️ The key: combine FAST tokens + flow-matching expert, and make sure those pesky diffusion gradients don't mess up your beautiful VLM backbone! :)