Ion Stoica (@istoica05) 's Twitter Profile
Ion Stoica

@istoica05

Professor at UC Berkeley, co-founder of Databricks, Anyscale, Conviva.

ID: 3139551624

calendar_today05-04-2015 01:25:47

16 Tweet

2,2K Followers

19 Following

Melissa Pan (@melissapan) 's Twitter Profile Photo

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️
🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks!

Paper: arxiv.org/pdf/2503.13657
Code: github.com/multi-agent-sy…

🧵1/n
Ion Stoica (@istoica05) 's Twitter Profile Photo

This journey has been a blast, and I'm very much looking forward to an exciting future, driven by our incredible community.

SkyPilot (@skypilot_org) 's Twitter Profile Photo

What a night! Huge thanks to everyone who came out to our first SkyPilot meetup — a packed house of builders and insightful convos.💥 Thanks to all speakers (sisil mehta Abridge, Woosuk Kwon vLLM, Ion Stoica, et al) for sharing SkyPilot use cases, and Anyscale

What a night! Huge thanks to everyone who came out to our first SkyPilot meetup — a packed house of builders and insightful convos.💥

Thanks to all speakers (<a href="/sisilmehta/">sisil mehta</a> <a href="/AbridgeHQ/">Abridge</a>, <a href="/woosuk_k/">Woosuk Kwon</a> <a href="/vllm_project/">vLLM</a>, <a href="/istoica05/">Ion Stoica</a>, et al) for sharing SkyPilot use cases, and <a href="/anyscalecompute/">Anyscale</a>
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

We thank the authors' for their feedback. However, there are a number of factual errors and misleading statements in this writeup: Regarding the statement that some model providers are not treated fairly: - This is not true. Given our capacity, we have always tried to honor all

Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

Real world AI pipelines are often compound, multi-module, and multi-step programs—unlike most RL/GRPO implementations today which optimize a single agent. 🚨 Super excited to release dspy.GRPO, which lets you GRPO tune any arbitrary multi-module, multi-step DSPy program, with

Ali Ghodsi (@alighodsi) 's Twitter Profile Photo

I am super excited to announce that we have agreed to acquire Neon, a developer-centric serverless Postgres company. The Neon team engineered a new database architecture that offers speed, elastic scaling, and branching and forking. The capabilities that make Neon great for

Percy Liang (@percyliang) 's Twitter Profile Photo

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
NovaSky (@novaskyai) 's Twitter Profile Photo

1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. 🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model

1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database.

🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model
Sumanth Hegde (@sumanthrh) 's Twitter Profile Photo

Some of our interesting observations from working on multi-turn text2SQL: - Data-efficient RL works pretty well: We did very typical GRPO settings; Just make sure to use "hard-enough" samples and no KL. KL can stabilize learning early on but will always bring down rewards

Andy Konwinski (@andykonwinski) 's Twitter Profile Photo

If you had 15min to tell thousands of Berkeley CS/Data/Stats grads what to do with their lives, what would you say? Last Thursday I told them to RUN AT FAILURE. Afterwards, while we were shaking hands & taking selfies, hundreds of them told me that they are excited to go fail. I

Manish Shetty (@slimshetty_) 's Twitter Profile Photo

✨ NEW SWE-Agents BENCHMARK ✨ Introducing GSO: The Global Software Optimization Benchmark - 👩🏻‍💻 100+ challenging software optimization tasks - 🛣️ a long-horizon task w/ precise specification - 🐘 large code changes in Py, C, C++, ... - 📉 SOTA models get < 5% success! 1/

✨ NEW SWE-Agents BENCHMARK ✨

Introducing GSO: The Global Software Optimization Benchmark
 - 👩🏻‍💻 100+ challenging software optimization tasks
 - 🛣️ a long-horizon task w/ precise specification
 - 🐘 large code changes in Py, C, C++, ...
 - 📉 SOTA models get &lt; 5% success!

1/
Robert Nishihara (@robertnishihara) 's Twitter Profile Photo

The AI compute software stack consists of 3 specialized layers: 🔧🔧🔧 Layer 1: Training & Inference Framework (PyTorch + vLLM) • Runs models efficiently on GPUs • Handles model optimization and model parallelism strategies • Manages accelerator memory and automatic

The AI compute software stack consists of 3 specialized layers:

🔧🔧🔧 Layer 1: Training &amp; Inference Framework (PyTorch + vLLM)
• Runs models efficiently on GPUs
• Handles model optimization and model parallelism strategies
• Manages accelerator memory and automatic
Hao AI Lab (@haoailab) 's Twitter Profile Photo

[Lmgame Bench] o3-pro: A Milestone in LLM Gaming! 🕹️ The leap from o3 to o3-pro is bigger than you might have thought. We tested o3-pro on Tetris and Sokoban— achieved SOTA on both and outperformed its previous self by a big margin. 🔍 🧱 Tetris Update o3-pro: ✅ 8+ lines

Agentica Project (@agentica_) 's Twitter Profile Photo

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.

💪DeepSWE
Daniel Kang (@daniel_d_kang) 's Twitter Profile Photo

As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks

Robert Nishihara (@robertnishihara) 's Twitter Profile Photo

Congratulations to my brilliant co-founder Philipp Moritz (Philipp Moritz) and the legendary John Schulman, Sergey Levine, Pieter Abbeel, and Michael Jordan on their Test-of-Time Honorable Mention at ICML 2025 today! For creating TRPO. This was done during the previous wave of

Congratulations to my brilliant co-founder Philipp Moritz (<a href="/pcmoritz/">Philipp Moritz</a>) and the legendary John Schulman, Sergey Levine, Pieter Abbeel, and Michael Jordan on their Test-of-Time Honorable Mention at ICML 2025 today!

For creating TRPO. This was done during the previous wave of
martin_casado (@martin_casado) 's Twitter Profile Photo

Remarkable how far we've come. From fear mongering OS AI across VC, academia, AI labs, and politicians to full throated endorsement. Thank you to everyone who has taken a stand on this over the last couple of years. In no small way have you helped sway and save a nation.

Remarkable how far we've come. From fear mongering OS AI across VC, academia, AI labs, and politicians to full throated endorsement. 

Thank you to everyone who has taken a stand on this over the last couple of years. In no small way have you helped sway and save a nation.