Shishir Patil (@shishirpatil_) 's Twitter Profile
Shishir Patil

@shishirpatil_

CS PhD @ UC Berkeley. Creator of Gorilla, GoEx, RAFT, OpenFunctions and Berkeley Function Calling Leaderboard. Previously researcher @GoogleAI @MSFTResearch

ID: 55854264

linkhttps://shishirpatil.github.io/ calendar_today11-07-2009 15:31:33

296 Tweet

3,3K Followers

992 Following

Roberta Raileanu (@robertarail) 's Twitter Profile Photo

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the

Avanika Narayan (@avanika15) 's Twitter Profile Photo

we shipp’d 👭 on-device lms and frontier cloud lms. and…they were a match☺️. 98% accuracy, just 17.5% the cloud API costs beyond excited to drop minions: where local lms meet cloud lms 😊 joint work w/Sabri Eyuboglu & Dan Biderman at @hazyresearch. ty Together AI,

Shishir Patil (@shishirpatil_) 's Twitter Profile Photo

🏆 Llama 4 is HERE—and it's LEADING the pack 🏆 We're thrilled to introduce LLaMA-4 Maverick: a 17B active model with 128 experts. It's now the top open-weights model out there—and crushing all benchmarks. Even more incredible was the opportunity to lead the research efforts of

🏆 Llama 4 is HERE—and it's LEADING the pack 🏆

We're thrilled to introduce LLaMA-4 Maverick: a 17B active model with 128 experts. It's now the top open-weights model out there—and crushing all benchmarks.

Even more incredible was the opportunity to lead the research efforts of
Shishir Patil (@shishirpatil_) 's Twitter Profile Photo

+121 ELO points - the steepest increase in ELO to date - that’s my team!! We took Llama to the moon! 🚀 Excited for everyone to have at it!

+121 ELO points - the steepest increase in ELO to date - that’s my team!! We took Llama to the moon! 🚀 Excited for everyone to have at it!
xjdr (@_xjdr) 's Twitter Profile Photo

Shishir is one of the primary reason i expect these models to be great generally but specifically good at function calling. i am very much looking forward to testing these models in complex tool use scenarios.

xjdr (@_xjdr) 's Twitter Profile Photo

A snipped of the review i did comparing 4o, sonnet, V3 and Maverick. This is specifically focusing on maverick vs V3 both because its more interesting and also because maverick outperformed 4o and V3 outperformed sonnet in most of my tests II. Tool Calling Implementation: A Tale

Aidan McLaughlin (@aidan_mclau) 's Twitter Profile Photo

ignore literally all the benchmarks the biggest o3 feature is tool use ofc it's smart, but it's also just way more useful >deep research quality in 30 seconds >debugs by googling docs and checking stackoverflow >writes whole python scripts in its CoT for fermi estimates

Bespoke Labs (@bespokelabsai) 's Twitter Profile Photo

OpenAI’s o4 just showed that multi-turn tool use is a huge deal for AI agents. Today, we show how to do the same with your own agents, using RL and open-source models. We used GRPO on only 100 high quality questions from the BFCL benchmark, and post-trained a 7B Qwen model to

OpenAI’s o4 just showed that multi-turn tool use is a huge deal for AI agents.
Today, we show how to do the same with your own agents, using RL and open-source models.

We used GRPO on only 100 high quality questions from the BFCL benchmark, and post-trained a 7B Qwen model to
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

Our xLAM (#LargeActionModels) family just got an upgrade! 1️⃣ Multi-turn, natural conversation support 2️⃣ Smarter multi-step reasoning 3️⃣ Models from 1B to 70B for ultimate flexibility 🤗 HuggingFace: bit.ly/4jyj2tu 👑 BFCL Leaderboard: bit.ly/3WIZdY3 Our

Our xLAM (#LargeActionModels) family just got an upgrade!

1️⃣ Multi-turn, natural conversation support
2️⃣ Smarter multi-step reasoning
3️⃣ Models from 1B to 70B for ultimate flexibility

🤗 HuggingFace: bit.ly/4jyj2tu 
👑 BFCL Leaderboard: bit.ly/3WIZdY3 

Our
NovaSky (@novaskyai) 's Twitter Profile Photo

1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50

1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50
Yutong Bai (@yutongbai1002) 's Twitter Profile Photo

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

Aakanksha Chowdhery (@achowdhery) 's Twitter Profile Photo

Today we launch Asimov. Asimov is our code research agent that is best-in-class in codebase comprehension. It is built for teams, built for enterprises, and built to remember. We use it everyday to accelerate our velocity and streamline distributed ops. Link below to sign up

Karan Vaidya (@karanvaidya6) 's Twitter Profile Photo

Agents aren’t reliable. They don’t learn from experience. At Composio, we provide skills that evolve with your agents Lightspeed gave us $25M to make agents usable

Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…

Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…