Angelos Katharopoulos (@angeloskath) 's Twitter Profile
Angelos Katharopoulos

@angeloskath

Machine Learning Research @Apple. Previously PhD student at @idiap_ch and @EPFL. Interested in all things machine learnable

ID: 874169451356278784

linkhttps://angeloskath.github.io/ calendar_today12-06-2017 07:40:13

293 Tweet

2,2K Followers

260 Following

Awni Hannun (@awnihannun) 's Twitter Profile Photo

Deep Seek V3 0324 in 4-bit works better in the latest mlx-lm. Prompt: write a program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically Took about 90 seconds to

Awni Hannun (@awnihannun) 's Twitter Profile Photo

First results are in. Llama 4 Maverick 17B active / 400B total is blazing fast with MLX on an M3 Ultra. Here is the 4-bit model generating 1100 tokens at 50 tok/sec:

Ivan Fioravanti ᯅ (@ivanfioravanti) 's Twitter Profile Photo

🔥 M3 Ultra with Llama-4-Scout-17B-16E-Instruct-4bit. mlx 0.25 is Ultra fast in prompt processing with MOE! mlx >=0.25 vs <0.25 8K tokens, 600 vs 328 toks/s 16K tokens, 608 vs 340 toks/s 32K tokens, 591 vs 323 toks/s

David Grangier (@grangierdavid) 's Twitter Profile Photo

#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training! Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025/p… (Fri Apr 25, 10 am). 1/3

#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training! 

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025/p… (Fri Apr 25, 10 am). 

1/3
Awni Hannun (@awnihannun) 's Twitter Profile Photo

The latest mlx-lm supports AWQ (activation-aware weight quantization), thanks to Alex Barron ! Use it to make 4-bit quantized models that are nearly as good as full-precision:

The latest mlx-lm supports AWQ (activation-aware weight quantization), thanks to <a href="/alex_barron1/">Alex Barron</a> !

Use it to make 4-bit quantized models that are nearly as good as full-precision:
Awni Hannun (@awnihannun) 's Twitter Profile Photo

Qwen3 and Qwen3 MoEs are already supported in the latest mlx-lm thanks to Prince Canuma and Gökdeniz Gülmez pip install -U mlx-lm Awesome that Qwen ships a model for every device: -iPhone: 0.6B, 4B -Macbook: 8B, 30B, 3B/30B MoE -M2, M3 Ultra: 22B/235B MoE

Awni Hannun (@awnihannun) 's Twitter Profile Photo

Benchmarked the whole Qwen3 family on an M4 Max with mlx-lm (except the 235B that doesn't fit). Stats generating 512 tokens, models in 4-bit:

Benchmarked the whole Qwen3 family on an M4 Max with mlx-lm (except the 235B that doesn't fit).

Stats generating 512 tokens, models in 4-bit:
Ivan Fioravanti ᯅ (@ivanfioravanti) 's Twitter Profile Photo

🔥 MLX vs Ollama: 1-0 Qwen3-30B-A3B-4bit 4K,8K,16K,32K contexts M3 Ultra 512GB export OLLAMA_FLASH_ATTENTION=1 (to make it faster)4

🔥 MLX vs Ollama: 1-0
Qwen3-30B-A3B-4bit

4K,8K,16K,32K contexts
M3 Ultra 512GB export
OLLAMA_FLASH_ATTENTION=1 (to make it faster)4
Awni Hannun (@awnihannun) 's Twitter Profile Photo

We have the full set of Gemma 3 4-bit DWQ models on the MLX Community Hugging Face. Use them for higher-quality 4-bit models:

We have the full set of Gemma 3 4-bit DWQ models on the MLX Community Hugging Face. Use them for higher-quality 4-bit models:
Lysandre (@lysandrejik) 's Twitter Profile Photo

The Transformers library is undergoing it's largest pivot to date 🙌 It now cements its role as the central model definition, irrespective of the backend and runner. One ground truth to bring more reliability across the ecosystem. Why is this important?

The Transformers library is undergoing it's largest pivot to date 🙌

It now cements its role as the central model definition, irrespective of the backend and runner.

One ground truth to bring more reliability across the ecosystem.

Why is this important?
Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

Let's goo! Starting today you can access 5000+ LLMs powered by MLX directly from Hugging Face Hub! 🔥 All you need to do is click `Use this model` from any compatible model \o/ That's it, all you need to get blazingly fast intelligence right at your terminal! What would you

yags (@yagilb) 's Twitter Profile Photo

Better together 🤝 LM Studio's new mlx-engine architecture is an effort to unify the brilliant work of Awni Hannun Angelos Katharopoulos (mlx-lm), Prince Canuma (mlx-vlm) and community contributors. This pattern is designed to be extended and we have a good first issue on the repo! 🍎

N8 Programs (@n8programs) 's Twitter Profile Photo

Reminder that writing MLX kernels is extremely powerful - for instance, on the fairly uncommon log-polar mapping operation (which MLX has no native support for), a simple MLX kernel reaches throughput up to 30x higher than OpenCV on the CPU w/ all 16 threads.

Úchèńnà Ogbújí ("Uche") (@uogbuji) 's Twitter Profile Photo

Long overdue release: Toolio 0.6.0. A major re-alignment with MLX. Power steering for LLMs on Mac, a #GenAI & #agent toolkit implementing #JSON schema-steered structured output & tool-calling in #Python ℹ️ github.com/OoriData/Tooli… 🐍 pypi.org/project/Toolio… #AI #LLM #Mac #MLX

Awni Hannun (@awnihannun) 's Twitter Profile Photo

If you haven’t tried the new DWQ and/or dynamic quants in mlx-lm, I highly recommend. They give much higher quality q4 MLX models. And the full quantization can be done locally:

If you haven’t tried the new DWQ and/or dynamic quants in mlx-lm, I highly recommend. They give much higher quality q4 MLX models. And the full quantization can be done locally:
Shashank Prasanna (@shshnkp) 's Twitter Profile Photo

Exciting MLX updates at #WWDC25 new webpage and 2 sessions for Python and Swift devs! New MLX webpage! mlx-framework.org Getting started with MLX by Awni Hannun developer.apple.com/videos/play/ww… Explore LLMs on Apple silicon by Angelos Katharopoulos developer.apple.com/videos/play/ww…

Molly Cantillon (@mollycantillon) 's Twitter Profile Photo

Everything will be local Yesterday I gave a talk about Real-World Applications of MLX and built a fast on-device semantic search index over Apple WWDC 2025 docs. Open-sourced the code for anyone curious! github.com/mcantillon21/l…