Lucy Shi (@lucy_x_shi) 's Twitter Profile
Lucy Shi

@lucy_x_shi

CS PhD student @Stanford. Robotics research @physical_int. Interested in robots, rockets, and humans.

ID: 1446952547504177154

linkhttps://lucys0.github.io/ calendar_today09-10-2021 21:35:59

223 Tweet

1,1K Followers

584 Following

Marcel Torné (@marceltornev) 's Twitter Profile Photo

Giving history to our robot policies is crucial to solve a variety of daily tasks. However, diffusion policies get worse when adding history. 🤖 In our recent work we learn how adding an auxiliary loss that we name Past-Token Prediction (PTP) together with cached embeddings

Danny Driess (@dannydriess) 's Twitter Profile Photo

How to build vision-language-action models that train fast, run fast & generalize? In our new paper, we formalize & analyze the approach of our π-0.5 model & further improve it with a single stage recipe. Blog: pi.website/research/knowl… Paper: pi.website/download/pi05_…

Kevin Black (@kvablack) 's Twitter Profile Photo

In LLM land, a slow model is annoying. In robotics, a slow model can be disastrous! Visible pauses at best, dangerously jerky motions at worst. But large VLAs are slow by nature. What can we do about this? An in-depth 🧵:

Seohong Park (@seohong_park) 's Twitter Profile Photo

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

Q-learning is not yet scalable

seohong.me/blog/q-learnin…

I wrote a blog post about my thoughts on scalable RL algorithms.

To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
Haoyu Xiong (@haoyu_xiong_) 's Twitter Profile Photo

Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust

Lucy Shi (@lucy_x_shi) 's Twitter Profile Photo

SRT-H has been published in Science Robotics (and featured on the cover). :) Turns out, when we apply YAY Robot to surgical settings, the robot can perform real surgical procedures like gallbladder removal autonomously.

Lucy Shi (@lucy_x_shi) 's Twitter Profile Photo

If you are interested in solving complex long-horizon tasks, please join us at the 3rd workshop on Learning Effective Abstractions for Planning (LEAP) at Conference on Robot Learning! 📅 Submission deadline: Sep 5 🐣 Early bird deadline: Aug 12

Lucy Shi (@lucy_x_shi) 's Twitter Profile Photo

Sadly I won’t be at ICML in person, but Chelsea Finn will be presenting Hi Robot tomorrow at 4:30pm in West Exhibition Hall B2-B3 (#W-403). Don’t miss it!

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

We’re helping to unlock the mysteries of the universe with AI. 🌌 Our novel Deep Loop Shaping method published in Science Magazine could help astronomers observe more events like collisions and mergers of black holes in greater detail, and gather more data about rare space

Physical Intelligence (@physical_int) 's Twitter Profile Photo

We've added pi-05 to the openpi repo: pi05-base, pi05-droid, pi05-libero. Also added PyTorch training code!🔥 Instructions and code here: github.com/Physical-Intel… This is an updated version of the model we showed cleaning kitchens and bedrooms in April: pi.website/blog/pi05

Lucy Shi (@lucy_x_shi) 's Twitter Profile Photo

I’m giving a talk at ICCV tomorrow on Robot Foundation Models with Multimodal Reasoning! Will cover our recent work on open-ended physical intelligence (Physical Intelligence) and world models (Stanford University). - Talk: 10-10:30 am @ MMRAgI (301 A) - Panel: 4:20-5:20 pm

I’m giving a talk at ICCV tomorrow on Robot Foundation Models with Multimodal Reasoning!
Will cover our recent work on open-ended physical intelligence (<a href="/physical_int/">Physical Intelligence</a>) and world models (<a href="/Stanford/">Stanford University</a>).

- Talk: 10-10:30 am @ MMRAgI (301 A)
- Panel: 4:20-5:20 pm
Lucy Shi (@lucy_x_shi) 's Twitter Profile Photo

Can a world model help robots improve themselves? We developed a controllable world model that accurately evaluates and improves VLA policies (like π0.5) for instruction following — showing a) strong correlation with real-world eval results and b) ~40% average gains on DROID

Chelsea Finn (@chelseabfinn) 's Twitter Profile Photo

Ctrl-World is a controllable world model that generalizes zero-shot to new environments, cameras, and objects. Paper: ctrl-world.github.io Model & code: github.com/Robert-gyj/Ctr… The results are exciting — a short thread on why. 🧵

Physical Intelligence (@physical_int) 's Twitter Profile Photo

Our model can now learn from its own experience with RL! Our new π*0.6 model can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes. More in the thread below.

Michael Equi (@michael_equi) 's Twitter Profile Photo

We developed RECAP Physical Intelligence to apply RL and interventions to π0.6, achieving high success rates and throughput on several challenging tasks! Watching these policies operate successfully for hours gives an appreciation for what the method can do

Laura Smith (@smithlaura1028) 's Twitter Profile Photo

Excited to share what we've been brewing at PI! We’re working on making robots more helpful by making them faster and more reliable through real-world practice, even on delicate behaviors like carrying this very full latte cup

Allen Z. Ren (@allenzren) 's Twitter Profile Photo

Real-world RL has long been difficult for long-horizon tasks like assembling box or folding laundry. Today with π*0.6, we see a scalable path towards training deployable VLAs that reliably bootstrap from an ever-growing data flywheel 🧵

Ian Goodfellow (@goodfellow_ian) 's Twitter Profile Photo

Amazing test of Gemini 3’s multimodal reasoning capabilities: try generating a threejs voxel art scene using only an image as input Prompt: I have provided an image. Code a beautiful voxel art scene inspired by this image. Write threejs code as a single-page