Angelos Katharopoulos (@angeloskath) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Deep Seek V3 0324 in 4-bit works better in the latest mlx-lm. Prompt: write a program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically Took about 90 seconds to

thumb_up_off_alt105

chat_bubble_outline4

repeat5

shareShare

Awni Hannun

@awnihannun

4 months ago

First results are in. Llama 4 Maverick 17B active / 400B total is blazing fast with MLX on an M3 Ultra. Here is the 4-bit model generating 1100 tokens at 50 tok/sec:

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat132

shareShare

Awni Hannun

@awnihannun

4 months ago

pip install -U mlx

thumb_up_off_alt70

chat_bubble_outline5

repeat4

shareShare

Ivan Fioravanti ᯅ

@ivanfioravanti

4 months ago

🔥 M3 Ultra with Llama-4-Scout-17B-16E-Instruct-4bit. mlx 0.25 is Ultra fast in prompt processing with MOE! mlx >=0.25 vs <0.25 8K tokens, 600 vs 328 toks/s 16K tokens, 608 vs 340 toks/s 32K tokens, 591 vs 323 toks/s

thumb_up_off_alt70

chat_bubble_outline5

repeat7

shareShare

David Grangier

@grangierdavid

3 months ago

#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training! Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025/p… (Fri Apr 25, 10 am). 1/3

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Awni Hannun

@awnihannun

3 months ago

The latest mlx-lm supports AWQ (activation-aware weight quantization), thanks to Alex Barron ! Use it to make 4-bit quantized models that are nearly as good as full-precision:

The latest mlx-lm supports AWQ (activation-aware weight quantization), thanks to <a href="/alex_barron1/">Alex Barron</a> !

Use it to make 4-bit quantized models that are nearly as good as full-precision:

thumb_up_off_alt149

chat_bubble_outline12

repeat13

shareShare

Awni Hannun

@awnihannun

3 months ago

Qwen3 and Qwen3 MoEs are already supported in the latest mlx-lm thanks to Prince Canuma and Gökdeniz Gülmez pip install -U mlx-lm Awesome that Qwen ships a model for every device: -iPhone: 0.6B, 4B -Macbook: 8B, 30B, 3B/30B MoE -M2, M3 Ultra: 22B/235B MoE

thumb_up_off_alt331

chat_bubble_outline10

repeat43

shareShare

Awni Hannun

@awnihannun

3 months ago

Benchmarked the whole Qwen3 family on an M4 Max with mlx-lm (except the 235B that doesn't fit). Stats generating 512 tokens, models in 4-bit:

thumb_up_off_alt548

chat_bubble_outline17

repeat56

shareShare

Ivan Fioravanti ᯅ

@ivanfioravanti

3 months ago

🔥 MLX vs Ollama: 1-0 Qwen3-30B-A3B-4bit 4K,8K,16K,32K contexts M3 Ultra 512GB export OLLAMA_FLASH_ATTENTION=1 (to make it faster)4

thumb_up_off_alt20

chat_bubble_outline4

repeat4

shareShare

Awni Hannun

@awnihannun

3 months ago

We have the full set of Gemma 3 4-bit DWQ models on the MLX Community Hugging Face. Use them for higher-quality 4-bit models:

thumb_up_off_alt89

chat_bubble_outline5

repeat10

shareShare

Lysandre

@lysandrejik

3 months ago

The Transformers library is undergoing it's largest pivot to date 🙌 It now cements its role as the central model definition, irrespective of the backend and runner. One ground truth to bring more reliability across the ecosystem. Why is this important?

thumb_up_off_alt214

chat_bubble_outline3

repeat55

shareShare

Vaibhav (VB) Srivastav

@reach_vb

3 months ago

Let's goo! Starting today you can access 5000+ LLMs powered by MLX directly from Hugging Face Hub! 🔥 All you need to do is click `Use this model` from any compatible model \o/ That's it, all you need to get blazingly fast intelligence right at your terminal! What would you

thumb_up_off_alt207

chat_bubble_outline9

repeat20

shareShare

yags

@yagilb

2 months ago

Better together 🤝 LM Studio's new mlx-engine architecture is an effort to unify the brilliant work of Awni Hannun Angelos Katharopoulos (mlx-lm), Prince Canuma (mlx-vlm) and community contributors. This pattern is designed to be extended and we have a good first issue on the repo! 🍎

thumb_up_off_alt75

chat_bubble_outline3

repeat10

shareShare

N8 Programs

@n8programs

2 months ago

Reminder that writing MLX kernels is extremely powerful - for instance, on the fairly uncommon log-polar mapping operation (which MLX has no native support for), a simple MLX kernel reaches throughput up to 30x higher than OpenCV on the CPU w/ all 16 threads.

thumb_up_off_alt66

chat_bubble_outline2

repeat6

shareShare

Úchèńnà Ogbújí ("Uche")

@uogbuji

2 months ago

Long overdue release: Toolio 0.6.0. A major re-alignment with MLX. Power steering for LLMs on Mac, a #GenAI & #agent toolkit implementing #JSON schema-steered structured output & tool-calling in #Python ℹ️ github.com/OoriData/Tooli… 🐍 pypi.org/project/Toolio… #AI #LLM #Mac #MLX

thumb_up_off_alt12

chat_bubble_outline0

repeat4

shareShare

Awni Hannun

@awnihannun

2 months ago

If you haven’t tried the new DWQ and/or dynamic quants in mlx-lm, I highly recommend. They give much higher quality q4 MLX models. And the full quantization can be done locally:

thumb_up_off_alt82

chat_bubble_outline1

repeat11

shareShare

Shashank Prasanna

@shshnkp

2 months ago

Exciting MLX updates at #WWDC25 new webpage and 2 sessions for Python and Swift devs! New MLX webpage! mlx-framework.org Getting started with MLX by Awni Hannun developer.apple.com/videos/play/ww… Explore LLMs on Apple silicon by Angelos Katharopoulos developer.apple.com/videos/play/ww…

thumb_up_off_alt72

chat_bubble_outline1

repeat15

shareShare

Awni Hannun

@awnihannun

2 months ago

Honestly, if you want to get up to speed quickly on using LLMs locally with MLX the #WWDC25 session by Angelos Katharopoulos has it all:

Honestly, if you want to get up to speed quickly on using LLMs locally with MLX the #WWDC25 session by <a href="/angeloskath/">Angelos Katharopoulos</a> has it all:

thumb_up_off_alt282

chat_bubble_outline9

repeat38

shareShare

Molly Cantillon

@mollycantillon

2 months ago

Everything will be local Yesterday I gave a talk about Real-World Applications of MLX and built a fast on-device semantic search index over Apple WWDC 2025 docs. Open-sourced the code for anyone curious! github.com/mcantillon21/l…

thumb_up_off_alt352

chat_bubble_outline10

repeat39

shareShare

Angelos Katharopoulos

@angeloskath

2 months ago

Kudos to David Koski! From discussion to merged PR in 2 days. You can't beat open source!

thumb_up_off_alt26

chat_bubble_outline0

repeat3

shareShare