Ahmed Awadallah (@ahmedhawadallah) 's Twitter Profile
Ahmed Awadallah

@ahmedhawadallah

Working on AI agents, SLMs and post-training @Microsoft | Partner Research Manager @Microsoft AI Frontiers

ID: 1103888904015364097

linkhttps://aka.ms/ahmed calendar_today08-03-2019 05:23:18

131 Tweet

946 Followers

359 Following

Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.

We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.
Suriya Gunasekar (@suriyagnskr) 's Twitter Profile Photo

I am thrilled to share our newest Phi models. This time we went all in on post-training to produce Phi-4-reasoning (SFT only) and Phi-4-reasoning-plus (SFT + a touch of RL) — both 14B models that pack a punch in a small size across reasoning and general purpose benchmarks🧵

I am thrilled to share our newest Phi models. This time we went all in on post-training to produce Phi-4-reasoning (SFT only) and Phi-4-reasoning-plus (SFT + a touch of RL) — both 14B models that pack a punch in a small size across reasoning and general purpose benchmarks🧵
Xeophon (@thexeophon) 's Twitter Profile Photo

I got asked whether I can run the new Phi4 on my personal bench. And while I wanted to deprecate my benchmark (for various reasons, I think it is too simple and does not catch nuances like it used to), who am I to refuse this request. Super surprised at the numbers, gg MSFT!

I got asked whether I can run the new Phi4 on my personal bench. 

And while I wanted to deprecate my benchmark (for various reasons, I think it is too simple and does not catch nuances like it used to), who am I to refuse this request. 

Super surprised at the numbers, gg MSFT!
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

How can smaller LLMs achieve strong reasoning? By combining data curation with supervised fine-tuning (SFT) and targeted reinforcement learning (RL). Microsoft released their first open reasoning/thinking models with Phi-4-reasoning distilled from OpenAI o3-mini. Implementation

How can smaller LLMs achieve strong reasoning? By combining data curation with supervised fine-tuning (SFT) and targeted reinforcement learning (RL). Microsoft released their first open reasoning/thinking models with Phi-4-reasoning distilled from OpenAI o3-mini.

Implementation
Microsoft Research (@msftresearch) 's Twitter Profile Photo

In this issue: New research on compound AI systems and causal verification of the Confidential Consortium Framework; release of Phi-4-reasoning; enriching tabular data with semantic structure, and more: msft.it/6012SVNCj

In this issue: New research on compound AI systems and causal verification of the Confidential Consortium Framework; release of Phi-4-reasoning; enriching tabular data with semantic structure, and more: msft.it/6012SVNCj
Nathan (@nathanhabib1011) 's Twitter Profile Photo

THINKING MODELS TOURNAMENT ARC 🧠📊 I ran open-source evals on some of the latest SOTA reasoning models Phi4 is the biggest surprise with insane results for only 14B! Claude isn’t the best reasoner... but crushes GPQA and simpleQA with sheer knowledge. Full results here 👇 1/N

THINKING MODELS TOURNAMENT ARC 🧠📊

I ran open-source evals on some of the latest SOTA reasoning models
Phi4 is the biggest surprise with insane results for only 14B!
Claude isn’t the best reasoner... but crushes GPQA and simpleQA with sheer knowledge.

Full results here 👇
1/N
Steven Bathiche (@sbathiche) 's Twitter Profile Photo

Phi-4 reasoning models are now available to download and run on the NPU on your Snapdragon-powered Copilot+ PC. azure.microsoft.com/en-us/blog/one…

Ahmed Awadallah (@ahmedhawadallah) 's Twitter Profile Photo

Two colleagues recently used our 14-billion parameters Phi-4-reasoning model to ace graduate-level Linear Algebra and Calculus BC tests—scoring 100% and 69/70 respectively. Thanks to the amazing work of our Windows + Devices colleagues, this model now runs on-device on

Ahmed Awadallah (@ahmedhawadallah) 's Twitter Profile Photo

A few months back, our team released Magentic-one -- showing how we can build multi-agent systems with AutoGen for complex web task completion. But how should humans interact with such systems? Magentic-UI shows how to build an agentic user experience, prioritizing

Mojan Javaheripi (@mojan_jp) 's Twitter Profile Photo

Great to see the additive dataset methodology we proposed in Phi-4-reasoning adopted in open-r1. Tldr: optimize data mixture per reasoning domain, and combine in final run for generalized performance. This is a game changer for reducing data ablation costs.

Ahmed Awadallah (@ahmedhawadallah) 's Twitter Profile Photo

Our team is releasing full evaluation logs (model generation, answer extractions, etc.) for 10 models in the Eureka Reasoning Models Study and also for Phi-4-reasoning and Phi-4-reasoning-plus (including reasoning traces) Hope this helps with research on transparency and

AutoGen (@pyautogen) 's Twitter Profile Photo

🚀 Introducing MCP Agents in Magentic-UI! Spin up custom agents that wrap one (or many) MCP tools, and let the Orchestrator pick the best agent for every step of the plan. Check out the demo below to see them in action 👇 #MCP #MagenticUI #AIagents

AutoGen (@pyautogen) 's Twitter Profile Photo

🚀 AutoGen v0.6.4 is out! Shout-out to GitHub Copilot for helping author these new features! 🧠 GraphFlow now retains execution state after termination, just like other group chats. Resets only when the graph fully completes. ⚙️ New parameter_override in Workbenches for