Stanford AI Lab (@stanfordailab) 's Twitter Profile
Stanford AI Lab

@stanfordailab

The Stanford Artificial Intelligence Laboratory (SAIL), a leading #AI lab since 1963. ⛵️🤖 Emmy-winning video: youtube.com/watch?v=Cn6nmW…

ID: 1059680847425527808

linkhttps://ai.stanford.edu/ calendar_today06-11-2018 05:36:16

3,3K Tweet

189,189K Followers

333 Following

Francis Engelmann (@francisengelman) 's Twitter Profile Photo

What makes a good 3D scene representation? Instead of meshes or Gaussians, we propose Superquadrics to decompose 3D scenes into extremely compact representations ➡️ check out our paper for exciting use-cases in robotics🤖 and GenAI🚀 super-dec.github.io w/ Elisabetta Fedele Marc Pollefeys

Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 🎉🎉🎉 TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to

A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 🎉🎉🎉

TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to
Youssef Allouah (@ys_alh) 's Twitter Profile Photo

Excited our paper "Certified Unlearning for Neural Networks" is accepted at ICML 2025! We introduce a method for provable machine unlearning-- truly "forgetting" data without restrictive assumptions like convexity. Paper: arxiv.org/abs/2506.06985 Code: github.com/stair-lab/cert…

Excited our paper "Certified Unlearning for Neural Networks" is accepted at ICML 2025!

We introduce a method for provable machine unlearning-- truly "forgetting" data without restrictive assumptions like convexity.

Paper: arxiv.org/abs/2506.06985
Code: github.com/stair-lab/cert…
Stanford AI Lab (@stanfordailab) 's Twitter Profile Photo

In Los Angeles for RSS 2025? 🤖 🌴Be sure to check out the great work by students from the Stanford AI Lab! ai.stanford.edu/blog/rss-2025/

Mayee Chen (@mayeechen) 's Twitter Profile Photo

LLMs often generate correct answers but struggle to select them. Weaver tackles this by combining many weak verifiers (reward models, LM judges) into a stronger signal using statistical tools from Weak Supervision—matching o3-mini-level accuracy with much cheaper models! 📊

LLMs often generate correct answers but struggle to select them. Weaver tackles this by combining many weak verifiers (reward models, LM judges) into a stronger signal using statistical tools from Weak Supervision—matching o3-mini-level accuracy with much cheaper models! 📊
Christopher Agia (@agiachris) 's Twitter Profile Photo

What makes data “good” for robot learning? We argue: it’s the data that drives closed-loop policy success! Introducing CUPID 💘, a method that curates demonstrations not by "quality" or appearance, but by how they influence policy behavior, using influence functions. (1/6)

Sanjana Srivastava (@sanjana__z) 's Twitter Profile Photo

🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, unconstrained, dynamic preferences, not just a complex physical domain. We introduce ROSETTA: a method to generate reward for such preferences cheaply. 🧵⬇️

Anjiang Wei (@anjiangw) 's Twitter Profile Photo

We introduce CodeARC, a new benchmark for evaluating LLMs’ inductive reasoning. Agents must synthesize functions from I/O examples—no natural language, just reasoning. 📄 arxiv.org/pdf/2503.23145 💻 github.com/Anjiang-Wei/Co… 🌐 anjiang-wei.github.io/CodeARC-Websit… #LLM #Reasoning #LLM4Code #ARC

We introduce CodeARC, a new benchmark for evaluating LLMs’ inductive reasoning. Agents must synthesize functions from I/O examples—no natural language, just reasoning.
📄 arxiv.org/pdf/2503.23145
💻 github.com/Anjiang-Wei/Co…
🌐 anjiang-wei.github.io/CodeARC-Websit…
#LLM #Reasoning #LLM4Code #ARC
Hong-Xing "Koven" Yu (@koven_yu) 's Twitter Profile Photo

#ICCV2025 🤩3D world generation is cool, but it is cooler to play with the worlds using 3D actions 👆💨, and see what happens! — Introducing *WonderPlay*: Now you can create dynamic 3D scenes that respond to your 3D actions from a single image! Web: kyleleey.github.io/WonderPlay/ 🧵1/7

Marcel Torné (@marceltornev) 's Twitter Profile Photo

Very happy to share that our work on learning long-history policies received the Best Paper Award from the Workshop on Learned Robot Representations Robotics: Science and Systems ! 🤖🥳 Check out our paper if you haven't already! long-context-dp.github.io Thank you to all the organizers and

Very happy to share that our work on learning long-history policies received the Best Paper Award from the Workshop on Learned Robot Representations <a href="/RoboticsSciSys/">Robotics: Science and Systems</a> ! 🤖🥳

Check out our paper if you haven't already! long-context-dp.github.io

Thank you to all the organizers and
Stanford Engineering (@stanfordeng) 's Twitter Profile Photo

Stanford Engineering’s fourth decade, 1955-1964, was a period of transformation. New departments were formed, computing entered the classroom, the Stanford “Dish” was completed, and Stanford AI Lab began shaping the future of AI. engineering100.stanford.edu/stories/a-peri…

Stefano Ermon (@stefanoermon) 's Twitter Profile Photo

Huge milestone from the team! A blazing-fast diffusion LLM built for chat, delivering real-time performance at commercial scale. If you liked Mercury Coder for code, you'll love this for conversation.

Hancheng Cao (@caohancheng) 's Twitter Profile Photo

Check out our latest work analyzing 21 million human–LLM conversations from Microsoft Bing Copilot and WildChat to uncover prototypical ways people interact with AI in real-world settings! arxiv.org/pdf/2505.16023

Kanishk Gandhi (@gandhikanishk) 's Twitter Profile Photo

New Paper: Can we collect human chains-of-thoughts by asking them to think out loud? In our new paper we automate and study this protocol with 5,000 human reasoning traces from 640 people solving Countdown problems. 1/5

Ekdeep Singh Lubana (@ekdeepl) 's Twitter Profile Photo

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

Stanford AI Lab (@stanfordailab) 's Twitter Profile Photo

Robot learning has largely focused on standard platforms—but can it embrace robots of all shapes and sizes? In Xiaomeng Xu's latest blog post, we show how data-driven methods bring unconventional robots to life, enabling capabilities that traditional designs and control can't

Robot learning has largely focused on standard platforms—but can it embrace robots of all shapes and sizes? In <a href="/XiaomengXu11/">Xiaomeng Xu</a>'s latest blog post, we show how data-driven methods bring unconventional robots to life, enabling capabilities that traditional designs and control can't
Annie Chen (@_anniechen_) 's Twitter Profile Photo

How should an RL agent leverage expert data to improve sample efficiency? Imitation losses can overly constrain an RL policy. In RL via Implicit Imitation Guidance, we show how to use expert data to guide more efficient *exploration*, avoiding pitfalls of imitation-augmented RL

How should an RL agent leverage expert data to improve sample efficiency?

Imitation losses can overly constrain an RL policy.

In RL via Implicit Imitation Guidance, we show how to use expert data to guide more efficient *exploration*, avoiding pitfalls of imitation-augmented RL
Surya Ganguli (@suryaganguli) 's Twitter Profile Photo

A great Quanta Magazine article on our theory of creativity in convolutional diffusion models lead by Mason Kamb. See also our paper with new results in version 2: arxiv.org/abs/2412.20292 to be presented as an oral at ICML Conference #icml25 thx Webb Wright !

Diyi Yang (@diyi_yang) 's Twitter Profile Photo

Our study led by CLS reveals an “ideation–execution gap” 😲 Ideas from LLMs may sound novel, but when experts spend 100+ hrs executing them, they flop: 💥 👉 human‑generated ideas outperform on novelty, excitement, effectiveness & overall quality!

Peng Qi (@qi2peng2) 's Twitter Profile Photo

Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of