Shelby Heinecke (@shelbyh_ai) 's Twitter Profile
Shelby Heinecke

@shelbyh_ai

Leading an AI Research team @SFResearch.
Agentic AI, On-Device AI, Efficient AI.
ML Theory PhD @thisisUIC, Math BS @MIT. On a mission! 🚀

ID: 1216139265152880645

linkhttp://www.shelbyh.ai calendar_today11-01-2020 23:26:17

268 Tweet

455 Followers

828 Following

The AI Conference (@aiconference) 's Twitter Profile Photo

Join us for the Building Agents Panel at The AI Conference 2024! Learn key principles and strategies for designing and implementing intelligent agents from our distinguished panel: Alex Chao, Product Lead at Microsoft Jaspar Carmichael-Jack, Founder and CEO of Artisan

Join us for the Building Agents Panel at The AI Conference 2024!

Learn key principles and strategies for designing and implementing intelligent agents from our distinguished panel:

Alex Chao, Product Lead at <a href="/Microsoft/">Microsoft</a> 
Jaspar Carmichael-Jack, Founder and CEO of <a href="/GetArtisanAI/">Artisan</a>
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

Our xLAM family ranges from 1B to 8x22B parameters, outperforming models 10x their size. Learn how we optimized them for efficiency and performance -- and are using them to innovate customer service. Watch "Large Action Models in a Multi-Agent World" on Salesforce+.

Zuxin Liu (@liuzuxin) 's Twitter Profile Photo

🎉 Thrilled to announce our paper "APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets" is accepted at #NeurIPS2024! 🚀 Accelerating AI Agent with our data and powerful xLAM models - all open-sourced! 🌟 Highlights👇 1️⃣ APIGen automates

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🔮 What’s next for #AgenticAI? 🔮 We've identified three key trends that will shape the next wave of our work. Stay ahead of the AI curve with AI Research Lead Shelby Heinecke Shelby Heinecke sforce.co/4f8J4Bz

Haolin Chen (@haolinchen11) 's Twitter Profile Photo

New preprint: Chain-of-thought demonstrated strong reasoning capabilities of LLMs🤖. But how to train them to reason🧐? Introducing LaTent Reasoning Optimization (LaTRO): a principled framework that formulates the reasoning trajectory as a latent variable and optimize it via RL.

New preprint:
Chain-of-thought demonstrated strong reasoning capabilities of LLMs🤖. But how to train them to reason🧐?

Introducing LaTent Reasoning Optimization (LaTRO): a principled framework that formulates the reasoning trajectory as a latent variable and optimize it via RL.
Jianguo Zhang (@jianguozhang3) 's Twitter Profile Photo

We're hiring AI Research Interns for Summer 2025! Spend 3 months with us working on AI Agents, LLMs, Reasoning, Planning & more—with a focus on publishing high-quality academic papers. If you have a strong publication record, apply or DM us ! #researchpaper #JobOpening #intern

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🌮 Introducing 🌮 TACO - our new family of multimodal action models that combine reasoning with real-world actions to solve complex visual tasks! 📊Results: 20% gains on MMVet 3.9% average improvement across 8 benchmarks 1M+ synthetic CoTA traces in training 🔓 🔓🔓Fully

🌮 Introducing 🌮 TACO - our new family of multimodal action models that combine reasoning with real-world actions to solve complex visual tasks!

📊Results:
20% gains on MMVet
3.9% average improvement across 8 benchmarks
1M+ synthetic CoTA traces in training

🔓 🔓🔓Fully
Shelby Heinecke (@shelbyh_ai) 's Twitter Profile Photo

Today, agents mostly handle text, but the future is multi-modal! Agents will soon be able to process, generate, and reason about images, video, audio, and more. We've taken the first step towards this future by training a multi-modal action model on synthetic text-image

Shelby Heinecke (@shelbyh_ai) 's Twitter Profile Photo

Our groundbreaking Large Action Model work has been accepted to #NAACL2025! We discovered that with high quality synthetic data, we can train LLMs from 8x22B to as tiny as 1B, to excel across function-calling benchmarks, beating significantly larger industry-leading models! ➡️

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

📣 Just launched: PersonaBench—our breakthrough benchmark for evaluating personalized AI assistants! ✏️ Paper: arxiv.org/html/2502.2061… 💻 GitHub repo: bit.ly/43CXQxC 🧠 Blog: sforce.co/4kzUL7E PersonaBench cracks a major AI challenge: creating synthetic user

📣 Just launched: PersonaBench—our breakthrough benchmark for evaluating personalized AI assistants!

✏️ Paper: arxiv.org/html/2502.2061…
💻 GitHub repo: bit.ly/43CXQxC
🧠 Blog: sforce.co/4kzUL7E 

PersonaBench cracks a major AI challenge: creating synthetic user
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

Our xLAM (#LargeActionModels) family just got an upgrade! 1️⃣ Multi-turn, natural conversation support 2️⃣ Smarter multi-step reasoning 3️⃣ Models from 1B to 70B for ultimate flexibility 🤗 HuggingFace: bit.ly/4jyj2tu 👑 BFCL Leaderboard: bit.ly/3WIZdY3 Our

Our xLAM (#LargeActionModels) family just got an upgrade!

1️⃣ Multi-turn, natural conversation support
2️⃣ Smarter multi-step reasoning
3️⃣ Models from 1B to 70B for ultimate flexibility

🤗 HuggingFace: bit.ly/4jyj2tu 
👑 BFCL Leaderboard: bit.ly/3WIZdY3 

Our
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🎬 NOW LIVE: "The AI Research Lab - Explained" debuts with our groundbreaking work on Large Action Models! Watch now: bit.ly/4kfipp4 Watch as Shelby Heinecke Shelby Heinecke reveals how we're training these specialized models to generate precise, executable actions

Salesforce (@salesforce) 's Twitter Profile Photo

.Salesforce AI Research’s new series “AI Research Lab - Explained” just dropped! First up? See how we fine-tune specialized models to predict actions, not just language—enabling faster, more precise execution of real-world tasks. ⏯️ Watch and subscribe on YouTube: youtube.com/watch?v=vlvv4Z…

Zahra Bahrololoumi CBE (@zahras_b) 's Twitter Profile Photo

Ever wondered how #AI agents can respond so quickly? The key to this powerful processing: Large Action Models (LAMs) – specialised, small-scale models that are optimised for speed and precision. Watch Shelby Heinecke explain how LAMs work: youtube.com/watch?v=vlvv4Z… Salesforce AI Research

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🎉 100K+ views and counting! 🎉 Our AI Research Lab Explained episode on Large Action Models hit a major milestone. And our model still at the top of the #BFCL Berkeley Function Calling Leaderboard. 📹 Haven't watched yet? bit.ly/4kfipp4 🏆 Berkeley Function Calling

🎉 100K+ views and counting! 🎉

Our AI Research Lab Explained episode on Large Action Models hit a major milestone. 

And our model still at the top of the #BFCL Berkeley Function Calling Leaderboard. 

📹 Haven't watched yet? bit.ly/4kfipp4
🏆 Berkeley Function Calling
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

Our female AI researchers don't just solve problems — they solve problems that haven't even been invented yet. While others wonder "Can AI do that?", they're already three steps ahead asking "How can we make more trusted, effective and efficient #EnterpriseAI” Scientists,

Our female AI researchers don't just solve problems — they solve problems that haven't even been invented yet.

While others wonder "Can AI do that?", they're already three steps ahead asking "How can we make more trusted, effective and efficient #EnterpriseAI”

Scientists,
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

⚡ Introducing MCPEval: the first automated evaluation framework for AI agents built on Model Context Protocol: 🔗 Paper: bit.ly/3TKXpLR 🔗 Code: bit.ly/44ZnUSN ✅ End-to-end task generation & verification ✅ Deep evaluation across 5 real-world domains ✅

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

💡 Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models 💡 📄 Paper: bit.ly/44IAvuO 💻 Code: bit.ly/4lLjQgd 😵‍💫 Have a task but experiencing prompt engineering existential dread? Few-shot or zero-shot? Chain-of-thought or ReAct?

💡 Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models 💡

📄 Paper: bit.ly/44IAvuO 
💻 Code: bit.ly/4lLjQgd 

😵‍💫 Have a task but experiencing prompt engineering existential dread?

Few-shot or zero-shot? Chain-of-thought or ReAct?
Cheng Qian (@qiancheng1231) 's Twitter Profile Photo

🤝 Can LLM agents really understand us? We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands. 📄 arxiv.org/pdf/2507.22034 💻 github.com/SalesforceAIRe…

🤝 Can LLM agents really understand us?

We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands.

📄 arxiv.org/pdf/2507.22034
💻 github.com/SalesforceAIRe…