Weizhu Chen (@weizhuchen) 's Twitter Profile
Weizhu Chen

@weizhuchen

Microsoft

ID: 14328706

linkhttps://www.microsoft.com/en-us/research/people/wzchen/ calendar_today08-04-2008 02:07:46

161 Tweet

2,2K Followers

212 Following

Weizhu Chen (@weizhuchen) 's Twitter Profile Photo

We updated Phi-3 mini in our June release, with the enhancement in instruction following, reasoning in MMLU (70.9)/GPQA(30.6), and better long context. Share with us your feedback on the new models huggingface.co/microsoft/Phi-… huggingface.co/microsoft/Phi-…

We updated Phi-3 mini in our June release, with the enhancement in instruction following, reasoning in MMLU (70.9)/GPQA(30.6), and better long context.   Share with us your feedback on the new models
huggingface.co/microsoft/Phi-…
huggingface.co/microsoft/Phi-…
Weizhu Chen (@weizhuchen) 's Twitter Profile Photo

We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also

We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also
Weizhu Chen (@weizhuchen) 's Twitter Profile Photo

Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: arxiv.org/pdf/2504.21233 hf: huggingface.co/microsoft/Phi-… Azure: aka.ms/phi4-mini-reas…

Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24.  
arxiv: arxiv.org/pdf/2504.21233
hf: huggingface.co/microsoft/Phi-…
Azure: aka.ms/phi4-mini-reas…
Weizhu Chen (@weizhuchen) 's Twitter Profile Photo

Synthesizing challenging problems that current model performs poorly is an important area in RL. Another thing interests me is the self-evolve learning via synthesizing questions/problems that the model can learn continuously. You may check our work here:mastervito.github.io/MasterVito.SwS…

Synthesizing challenging problems that current model performs poorly is an important area in RL.  Another thing interests me is the self-evolve learning via synthesizing questions/problems that the model can learn continuously. 
You may check our work here:mastervito.github.io/MasterVito.SwS…
Weizhu Chen (@weizhuchen) 's Twitter Profile Photo

You may check our work of Phi4-mini-flash-Reasoning. What I like the most is the Gated Memory Unit (GMU) design, which can be applied in future model design to achieve quality and long context, as well as the uP++. Liliang Ren ✈️ ICML 2025

Weizhu Chen (@weizhuchen) 's Twitter Profile Photo

See our work in the workshop today. If you are looking for opportunities to work on efficient model architecture or whatever to make the training or inference run much faster with thousands or more gpus, please come to talk to us or dm me. We are hiring.