Siddharth Karamcheti (@siddkaramcheti) Twitter Tweets • TwiCopy

Allen Z. Ren

9 months ago

HNY! Lately I took a crack at implementing the pi0 model from Physical Intelligence PaliGemma VLM (2.3B fine-tuned) + 0.3B "action expert" MoE + block attention Flow matching w/ action chunking Strong eval on Simpler w/ 75ms inference github.com/allenzren/open… ckpts available! 👇(1/6)

thumb_up_off_alt395

chat_bubble_outline18

repeat56

shareShare

Tyler Zhu

@tyleryzhu

9 months ago

Have you ever wondered why we don’t use multiple visual encoders for VideoLLMs? We thought the same! Excited to announce our latest work MERV, on using Multiple Encoders for Representing Videos in VideoLLMs, outperforming prior works with the same data. 🧵

thumb_up_off_alt125

chat_bubble_outline5

repeat13

shareShare

Jacob Andreas

@jacobandreas

9 months ago

Are you an undergrad interested in NLP research? Intern with us through the MIT summer research program! Includes stipend, travel, housing. Students from historically underserved bgs are strongly encouraged to apply. Deadline is 21 Jan 2025. More info at oge.mit.edu/msrp/.

thumb_up_off_alt129

chat_bubble_outline3

repeat32

shareShare

Jay Alammar

@jayalammar

8 months ago

Alphaxiv is an awesome way to discuss ML papers -- often with the authors themselves. Here's an intro and demo by Raj Palleti at #neurips2024 .

thumb_up_off_alt154

chat_bubble_outline3

repeat23

shareShare

Kevin Zakka

@kevin_zakka

8 months ago

The ultimate test of any physics simulator is its ability to deliver real-world results. With MuJoCo Playground, we’ve combined the very best: MuJoCo’s rich and thriving ecosystem, massively parallel GPU-accelerated simulation, and real-world results across a diverse range of

thumb_up_off_alt894

chat_bubble_outline35

repeat178

shareShare

Jaden Clark

@jadenvclark

8 months ago

How can we leverage human video data to train generalist robot policies? 🤖 Enter RAD: Reasoning through Action-Free Data, a new way to train robot policies using both robot and human video data via action reasoning. rad-generalization.github.io

thumb_up_off_alt100

chat_bubble_outline4

repeat34

shareShare

Danfei Xu

@danfei_xu

7 months ago

Thrilled to share this story covering our collaboration with Project Aria @Meta Reality Labs at Meta ! Human data is robot data in disguise. Imitation learning is human modeling. We are at the beginning of something truly revolutionary, both for robotics and human-level AI beyond language.

thumb_up_off_alt165

chat_bubble_outline2

repeat19

shareShare

HRI Pioneers

@hripioneers

7 months ago

Welcome #HRIPioneers2025! Megha Srivastava from Stanford University will present their work 'Robotics for Personalized Motor Skills Instruction' at The HRI Conference Read more on Megha's website: cs.stanford.edu/~megha

thumb_up_off_alt23

chat_bubble_outline1

repeat10

shareShare

Raunaq Bhirangi

@raunaqmb

7 months ago

Ever struggled with multi-sensor data from cameras, depth sensors, and other custom sensors? Meet AnySense—an iPhone app for effortless data acquisition and streaming. Working with multimodal sensor data will never be a chore again!

thumb_up_off_alt136

chat_bubble_outline7

repeat26

shareShare

Andrea Bajcsy

@andrea_bajcsy

7 months ago

📢 Announcing the first IEEE ICRA workshop on Safely Leveraging VLMs in Robotics! #ICRA2025 🎯 How can we safely leverage vision-language foundation models to expand robot deployment? 📅 Short papers & failure demos due 04/11/23 🌐 tinyurl.com/safe-vlm 🧵(1/5)

thumb_up_off_alt63

chat_bubble_outline1

repeat15

shareShare

Siddharth Karamcheti

@siddkaramcheti

7 months ago

Is there a nice solution for porting a Flax model to PyTorch (and vice-versa)? Or minimally a list of common gotchas in the porting process/important unit tests to write (with expected tolerances for specific ops)?

thumb_up_off_alt2

chat_bubble_outline2

repeat0

shareShare

Dorsa Sadigh

@dorsasadigh

7 months ago

Here is another uncut video of real-time interactions with Google DeepMind 's Gemini Robotics!

thumb_up_off_alt1,1K

chat_bubble_outline48

repeat178

shareShare

Karl Pertsch

@karlpertsch

5 months ago

Training with discrete FAST action tokenization now powers all of our pre-training in π-0.5! When combined with π-0 style flow matching during post-training we get both, fast training & fast inference :)

thumb_up_off_alt30

chat_bubble_outline0

repeat2

shareShare

Suraj Nair

@surajnair_1

5 months ago

Since the first year of my PhD, every talk I’ve given has opened with a slide about the distant north star: dropping a robot in a home it’s never been before and having it do useful things. I think it might be time for me to find a new opening slide 😀. Thrilled to share π-0.5!

thumb_up_off_alt118

chat_bubble_outline5

repeat4

shareShare

Amber Xie

@amberxie_

5 months ago

Introducing ✨Latent Diffusion Planning✨ (LDP)! We explore how to use expert, suboptimal, & action-free data. To do so, we learn a diffusion-based *planner* that forecasts latent states, and an *inverse-dynamics model* that extracts actions. w/ Oleg Rybkin Dorsa Sadigh Chelsea Finn

thumb_up_off_alt345

chat_bubble_outline2

repeat39

shareShare

Erdem Bıyık

@ebiyik_

5 months ago

We developed a computational model of human interventions/corrections and a method to learn from such feedback. We don't need RL in the loop, so it is very efficient. Yigit will be presenting this work at ICRA and both of us will be there.

thumb_up_off_alt25

chat_bubble_outline0

repeat2

shareShare

Lucy Li

@lucy3_li

5 months ago

I'm joining UW–Madison Computer Sciences UW School of Computer, Data & Information Sciences as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student! Before then, I'll postdoc for a year at another UW🏔️ -- UW NLP Allen School.

I'm joining <a href="/WisconsinCS/">UW–Madison Computer Sciences</a> <a href="/uwcdis/">UW School of Computer, Data & Information Sciences</a> as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student!

Before then, I'll postdoc for a year at another UW🏔️ -- <a href="/uwnlp/">UW NLP</a> <a href="/uwcse/">Allen School</a>.

thumb_up_off_alt658

chat_bubble_outline72

repeat37

shareShare

Percy Liang

@percyliang

4 months ago

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

thumb_up_off_alt939

chat_bubble_outline39

repeat185

shareShare

Percy Liang

@percyliang

4 months ago

For a rare look into how LLMs are really built, check out David Hall's retrospective on how we trained the Marin 8B model from scratch (and outperformed Llama 3.1 8B base). It’s an honest account with all the revelations and mistakes we made along our journey. Papers are forced to

thumb_up_off_alt503

chat_bubble_outline2

repeat78

shareShare

David Hall

@dlwh

4 months ago

32B's in the works! So far a lot fewer mistakes. I do learn sometimes. wandb.ai/marin-communit… marin.community

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare