Brian Keene (@bpkeene) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

argmax

@argmaxinc

a year ago

Here is the compounded speedup when considering the `qmv` improvements on top. Note that speedups dramatically improve for short sequence lengths:

thumb_up_off_alt5

chat_bubble_outline2

repeat1

shareShare

LLMs are faster and more memory efficient in MLX! - All quantized models 30%+ faster h/t Angelos Katharopoulos - Fused attention for longer context can be 2x+ faster and use way less memory h/t Brian Keene Atila argmax Some tokens-per-second benchmarks for 7B Mistral:

LLMs are faster and more memory efficient in MLX!

- All quantized models 30%+ faster h/t <a href="/angeloskath/">Angelos Katharopoulos</a>
- Fused attention for longer context can be 2x+ faster and use way less memory h/t <a href="/bpkeene/">Brian Keene</a> <a href="/atiorh/">Atila</a> <a href="/argmaxinc/">argmax</a>

Some tokens-per-second benchmarks for 7B Mistral:

thumb_up_off_alt175

chat_bubble_outline8

repeat12

shareShare

clem 🤗

@clementdelangue

a year ago

Love how Apple is advocating for on-device AI at WWDC . Local, smaller, specialized models are the future of private, secure and efficient AI.

thumb_up_off_alt943

chat_bubble_outline36

repeat99

shareShare

Awni Hannun

@awnihannun

a year ago

SD3 runs locally with MLX thanks to the incredible work from argmax Super easy setup, docs here: github.com/argmaxinc/Diff… Takes < 30 seconds to generate an image on my M1 Max:

thumb_up_off_alt385

chat_bubble_outline3

repeat53

shareShare

INIYSA

@lafaiel

a year ago

This is crazy. According to Qualcomm, the X Elite runs Whisper Base-En at 72 tok/s (13.8ms), while the A17 runs it at 237 tok/s Properly optimized hw&sw really matter

thumb_up_off_alt335

chat_bubble_outline6

repeat27

shareShare

argmax

@argmaxinc

a year ago

FLUX.1-schnell on DiffusionKit with MLX Video demo of an M3 Max MacBook generating this 768x1360 image with bfloat16 weights in 39 seconds in thread. Further optimizations in flux. Install: pip install diffusionkit==0.3.0 Repo: github.com/argmaxinc/Diff…

thumb_up_off_alt124

chat_bubble_outline7

repeat23

shareShare

Awni Hannun

@awnihannun

a year ago

Flux Schnell in the latest DiffusionKit with MLX is 30% faster and uses less RAM! pip install -U diffusionkit Generating some high quality images in < a minute on my 32GB M1 max laptop:

thumb_up_off_alt272

chat_bubble_outline6

repeat34

shareShare

Eduardo Pacheco

@noteduardop

a year ago

Did you know that with a simple trick you can run FLUX.1 locally on your macbook with a peak memory of ~5gb?

thumb_up_off_alt67

chat_bubble_outline2

repeat5

shareShare

Awni Hannun

@awnihannun

a year ago

Generating images with 4-bit Flux Schnell on my M1 Max laptop is pretty awesome. Less than 30 seconds model loading and all, and uses about ~5GB peak RAM. Check-out DiffusionKit + MLX github.com/argmaxinc/Diff…

thumb_up_off_alt307

chat_bubble_outline9

repeat48

shareShare

Awni Hannun

@awnihannun

a year ago

Generating images with DiffusionKit + Flux Schnell is much faster in the latest MLX On an M2 Ultra down to less than 9 seconds from close to 13 before. Docs here: github.com/argmaxinc/Diff…

thumb_up_off_alt216

chat_bubble_outline11

repeat23

shareShare

argmax

@argmaxinc

9 months ago

WhisperKit-0.9 is out! - Faster Large v3 Turbo on Mac and iPhone - Fast Model Load on TestFlight App (Experimental) - Memory reduction for large input handling contributed by Kosta Eleftheriou TestFlight: testflight.apple.com/join/LPVOyJZW GitHub (MIT): github.com/argmaxinc/Whis… New models on

thumb_up_off_alt29

chat_bubble_outline0

repeat10

shareShare

argmax

@argmaxinc

9 months ago

WhisperKit on Android In collaboration with Qualcomm, WhisperKit is growing from Apple platforms to Android! Samsung Galaxy S24 running at 300tok/s: Links in🧵

thumb_up_off_alt32

chat_bubble_outline3

repeat7

shareShare

argmax

@argmaxinc

9 months ago

WhisperKit Benchmarks are live on Hugging Face! Speech-to-text systems are hard to benchmark holistically given trade-offs across latency, memory, energy efficiency and accuracy. On-device testing makes it doubly challenging. Here is our first version built with Gradio 🧵

thumb_up_off_alt75

chat_bubble_outline2

repeat14

shareShare

argmax

@argmaxinc

8 months ago

We raised $8M and are thrilled to have Salesforce Ventures General Catalyst Julien Chaumond Amjad Masad Michele Catasta and other industry leader angels join us as investors. We are hiring across all positions! Our thoughts and job application links here: argmaxinc.com/blog/seed

We raised $8M and are thrilled to have <a href="/SalesforceVC/">Salesforce Ventures</a> <a href="/generalcatalyst/">General Catalyst</a> <a href="/julien_c/">Julien Chaumond</a> <a href="/amasad/">Amjad Masad</a> <a href="/pirroh/">Michele Catasta</a> and other industry leader angels join us as investors.

We are hiring across all positions! Our thoughts and job application links here: argmaxinc.com/blog/seed

thumb_up_off_alt71

chat_bubble_outline7

repeat24

shareShare

argmax

@argmaxinc

7 months ago

Introducing WhisperKit Pro & SpeakerKit Pro We have built major performance and feature set upgrades to WhisperKit! We are calling it WhisperKit Pro, our fastest and most comprehensive on-device speech AI offering yet. SpeakerKit Pro is our new on-device inference framework for

thumb_up_off_alt89

chat_bubble_outline2

repeat14

shareShare

argmax

@argmaxinc

6 months ago

WhisperKit Android is now in Beta! WhisperKit is open for business across Android and Apple platforms. Links to code and benchmarks are below in the thread.

thumb_up_off_alt28

chat_bubble_outline1

repeat7

shareShare

argmax

@argmaxinc

4 months ago

Introducing SpeakerKit State-of-the-art on-device speaker diarization: - 10 minutes of audio processed in 3 seconds - 10 megabytes in total - 6-year-old devices supported Details and links to the demo app are in the thread.

thumb_up_off_alt326

chat_bubble_outline7

repeat37

shareShare

Awni Hannun

@awnihannun

3 months ago

RT if you want day-zero support for OpenAI new open-weights model to run fast on your laptop with MLX.

thumb_up_off_alt375

chat_bubble_outline12

repeat100

shareShare

argmax

@argmaxinc

3 months ago

Exciting SpeakerKit updates! - Faster inference and lower error rates across 13 benchmark datasets - Code and paper for benchmarks and system architecture are in the replies - Ability to set the speaker count to reduce the error rate even further

thumb_up_off_alt42

chat_bubble_outline1

repeat6

shareShare

argmax

@argmaxinc

a month ago

Nvidia Frontier Speech Models on Argmax SDK Nvidia's top-ranking speech-to-text models are now seamlessly running on device with Argmax SDK, available today! Details in thread

thumb_up_off_alt68

chat_bubble_outline5

repeat18

shareShare