Shishir Patil (@shishirpatil_) Twitter Tweets • TwiCopy

Shishir Patil

@shishirpatil_

+ Follow

CS PhD @ UC Berkeley. Creator of Gorilla, GoEx, RAFT, OpenFunctions and Berkeley Function Calling Leaderboard. Previously researcher @GoogleAI @MSFTResearch

ID: 55854264

linkhttps://shishirpatil.github.io/ calendar_today11-07-2009 15:31:33

296 Tweet

3,3K Followers

992 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the

thumb_up_off_alt481

chat_bubble_outline14

repeat117

shareShare

Avanika Narayan

@avanika15

6 months ago

we shipp’d 👭 on-device lms and frontier cloud lms. and…they were a match☺️. 98% accuracy, just 17.5% the cloud API costs beyond excited to drop minions: where local lms meet cloud lms 😊 joint work w/Sabri Eyuboglu & Dan Biderman at @hazyresearch. ty Together AI,

thumb_up_off_alt81

chat_bubble_outline6

repeat44

shareShare

Shishir Patil

@shishirpatil_

4 months ago

🏆 Llama 4 is HERE—and it's LEADING the pack 🏆 We're thrilled to introduce LLaMA-4 Maverick: a 17B active model with 128 experts. It's now the top open-weights model out there—and crushing all benchmarks. Even more incredible was the opportunity to lead the research efforts of

thumb_up_off_alt140

chat_bubble_outline9

repeat6

shareShare

Shishir Patil

@shishirpatil_

4 months ago

+121 ELO points - the steepest increase in ELO to date - that’s my team!! We took Llama to the moon! 🚀 Excited for everyone to have at it!

thumb_up_off_alt191

chat_bubble_outline14

repeat12

shareShare

xjdr

@_xjdr

4 months ago

Shishir is one of the primary reason i expect these models to be great generally but specifically good at function calling. i am very much looking forward to testing these models in complex tool use scenarios.

thumb_up_off_alt112

chat_bubble_outline4

repeat3

shareShare

xjdr

@_xjdr

4 months ago

A snipped of the review i did comparing 4o, sonnet, V3 and Maverick. This is specifically focusing on maverick vs V3 both because its more interesting and also because maverick outperformed 4o and V3 outperformed sonnet in most of my tests II. Tool Calling Implementation: A Tale

thumb_up_off_alt148

chat_bubble_outline3

repeat7

shareShare

Aidan McLaughlin

@aidan_mclau

4 months ago

ignore literally all the benchmarks the biggest o3 feature is tool use ofc it's smart, but it's also just way more useful >deep research quality in 30 seconds >debugs by googling docs and checking stackoverflow >writes whole python scripts in its CoT for fermi estimates

thumb_up_off_alt1,1K

chat_bubble_outline61

repeat80

shareShare

Bespoke Labs

@bespokelabsai

4 months ago

OpenAI’s o4 just showed that multi-turn tool use is a huge deal for AI agents. Today, we show how to do the same with your own agents, using RL and open-source models. We used GRPO on only 100 high quality questions from the BFCL benchmark, and post-trained a 7B Qwen model to

thumb_up_off_alt380

chat_bubble_outline21

repeat50

shareShare

Salesforce AI Research

@sfresearch

4 months ago

Our xLAM (#LargeActionModels) family just got an upgrade! 1️⃣ Multi-turn, natural conversation support 2️⃣ Smarter multi-step reasoning 3️⃣ Models from 1B to 70B for ultimate flexibility 🤗 HuggingFace: bit.ly/4jyj2tu 👑 BFCL Leaderboard: bit.ly/3WIZdY3 Our

thumb_up_off_alt52

chat_bubble_outline0

repeat19

shareShare

NovaSky

@novaskyai

3 months ago

1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50

thumb_up_off_alt266

chat_bubble_outline2

repeat68

shareShare

Joey Gonzalez

@profjoeyg

3 months ago

Astasia Myers Good observation! I had done some work with Shishir Patil and Raluca Ada Popa on using LLMs to evaluate the credentials (and risks) of potential tool calls.

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Yutong Bai

@yutongbai1002

a month ago

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

thumb_up_off_alt283

chat_bubble_outline17

repeat74

shareShare

Aakanksha Chowdhery

@achowdhery

25 days ago

Today we launch Asimov. Asimov is our code research agent that is best-in-class in codebase comprehension. It is built for teams, built for enterprises, and built to remember. We use it everyday to accelerate our velocity and streamline distributed ops. Link below to sign up

thumb_up_off_alt366

chat_bubble_outline25

repeat17

shareShare

Karan Vaidya

@karanvaidya6

19 days ago

Agents aren’t reliable. They don’t learn from experience. At Composio, we provide skills that evolve with your agents Lightspeed gave us $25M to make agents usable

thumb_up_off_alt1,1K

chat_bubble_outline227

repeat146

shareShare

Shishir Patil

@shishirpatil_

17 days ago

Roberta is amazing to work with!!

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

Oleksii Kuchaiev

@kuchaev

16 days ago

Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…

thumb_up_off_alt187

chat_bubble_outline8

repeat42

shareShare

Shishir Patil

Gate.io

Roberta Raileanu

Avanika Narayan

Shishir Patil

Shishir Patil

xjdr

xjdr

Aidan McLaughlin

Bespoke Labs

Salesforce AI Research

NovaSky

Joey Gonzalez

Yutong Bai

Aakanksha Chowdhery

Karan Vaidya

Shishir Patil

Oleksii Kuchaiev