
Brian Keene
@bpkeene
Technical Staff @ @argmaxinc | former Apple ML Engineer with on-device inference
ID: 4717525021
https://www.linkedin.com/in/brian-keene-3b7712a2/ 06-01-2016 07:45:18
42 Tweet
178 Followers
209 Following


LLMs are faster and more memory efficient in MLX! - All quantized models 30%+ faster h/t Angelos Katharopoulos - Fused attention for longer context can be 2x+ faster and use way less memory h/t Brian Keene Atila argmax Some tokens-per-second benchmarks for 7B Mistral:










WhisperKit-0.9 is out! - Faster Large v3 Turbo on Mac and iPhone - Fast Model Load on TestFlight App (Experimental) - Memory reduction for large input handling contributed by Kosta Eleftheriou TestFlight: testflight.apple.com/join/LPVOyJZW GitHub (MIT): github.com/argmaxinc/Whis… New models on



We raised $8M and are thrilled to have Salesforce Ventures General Catalyst Julien Chaumond Amjad Masad Michele Catasta and other industry leader angels join us as investors. We are hiring across all positions! Our thoughts and job application links here: argmaxinc.com/blog/seed






