black_samorez (@black_samorez) Twitter Tweets • TwiCopy

Dan Alistarh

2 years ago

Announcing AQLM v1.1! Featuring: 1. New model collection with SOTA accuracy huggingface.co/collections/IS… 2. Gemma-2B support, running within 1.5GB; 3. LoRA integration for training Mixtral-8x7 on Colab; 4. Faster generation (3x) via CUDA graphs. Check it out: github.com/Vahe1994/AQLM

thumb_up_off_alt94

chat_bubble_outline1

repeat16

shareShare

black_samorez

@black_samorez

a year ago

Just arrived at #ICML2024. If you want to meet and discuss LLM compression and efficient deployment, I'd be happy to!

thumb_up_off_alt2

chat_bubble_outline4

repeat0

shareShare

black_samorez

@black_samorez

a year ago

Tomorrow I will be presenting AQLM at the #ICML2024 13:30-15:00 poster session, stand 608. If you're interested in how we compressed LLMs down to 2bit per weight and how you can run Lllama-3-70b on RTX4090 with #vLLM, pay us a visit! Conference link: icml.cc/virtual/2024/p…

thumb_up_off_alt13

chat_bubble_outline5

repeat2

shareShare

Dan Alistarh

@dalistarh

a year ago

Introducing Panza V2, our personalized LLM writing assistant, running entirely on-device! Now faster and easier to use: * Local serving via GMail extension * Cloud training via Lightning AI ⚡️ Studio * More models, including AI at Meta LLama-3.2 * Inference w/ ollama! Details:

thumb_up_off_alt68

chat_bubble_outline4

repeat14

shareShare

black_samorez

@black_samorez

a year ago

AQLM but better and in browser

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

harsha

@sree_harsha_n

a year ago

Excited to host black_samorez PhD student @ IST who will present 'Pushing the limits of LLM quantization via the linearity theorem' on Jan 10 @ 1800 CET at Cohere For AI. Really cool results and look forward to the talk. Join the community: tinyurl.com/C4AICommunityA…

Excited to host <a href="/black_samorez/">black_samorez</a> PhD student @ IST who will present 'Pushing the limits of LLM quantization via the linearity theorem' on Jan 10 @ 1800 CET at <a href="/CohereForAI/">Cohere For AI</a>. Really cool results and look forward to the talk. Join the community: tinyurl.com/C4AICommunityA…

thumb_up_off_alt20

chat_bubble_outline0

repeat6

shareShare

black_samorez

@black_samorez

10 months ago

Thanks again for inviting me! It was a pleasure sharing our research with all of you.

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

harsha

@sree_harsha_n

9 months ago

We will have black_samorez presenting his work on low-bit pre-training at Cohere For AI next week (stable training at 1 bit weights + activations) -- continuing our theme of low bit training. Looking forward :) To join in, fill the form at: tinyurl.com/C4AICommunityA…

We will have <a href="/black_samorez/">black_samorez</a> presenting his work on low-bit pre-training at <a href="/CohereForAI/">Cohere For AI</a> next week (stable training at 1 bit weights + activations) -- continuing our theme of low bit training. Looking forward :)

To join in, fill the form at: tinyurl.com/C4AICommunityA…

thumb_up_off_alt17

chat_bubble_outline0

repeat5

shareShare

black_samorez

@black_samorez

8 months ago

We'll be presenting this on April 27th in Singapore. For now, you can check out this recording of the Cohere For AI efficiency seminar on this topic: youtube.com/watch?v=e3ClKT…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Dan Alistarh

@dalistarh

6 months ago

We are introducing Quartet, a fully FP4-native training method for Large Language Models, achieving optimal accuracy-efficiency trade-offs on NVIDIA Blackwell GPUs! Quartet can be used to train billion-scale models in FP4 faster than FP8 or FP16, at matching accuracy. [1/4]

thumb_up_off_alt393

chat_bubble_outline20

repeat77

shareShare

harsha

@sree_harsha_n

5 months ago

We're excited to welcome back one of our most frequent speakers, @black_samorezis, to the ml-efficiency group and @cohere_labs! Join us on July 2 at 9 AM PST to hear about Quartet: Native FP4 Training of LLMs.

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

black_samorez

@black_samorez

4 months ago

ChatGPT agent mode apparently has enough RAM to load and run a full-fledged 8B LLM in browser. Kudos to Vladimir Malinovskii for implementing AQLM 2-bit quantization in WebAssembly at galqiwi.github.io/aqlm-rs

ChatGPT agent mode apparently has enough RAM to load and run a full-fledged 8B LLM in browser. Kudos to <a href="/galqiwi/">Vladimir Malinovskii</a> for implementing AQLM 2-bit quantization in WebAssembly at galqiwi.github.io/aqlm-rs

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Egor Zverev @ICLR 2025

@egor_zverev_ai

2 months ago

🎉 Excited to announce the Workshop on Foundations of LLM Security at #EurIPS2025! 🇩🇰 Dec 6–7, Copenhagen! 📢 Call for contributed talks is now open! See details at llmsec-eurips.github.io Kathrin Grosse Ilia Shumailov🦔 Verena Rieser sahar selim taher @thegrue Mario Fritz EurIPS Conference

thumb_up_off_alt24

chat_bubble_outline1

repeat10

shareShare

alex wortega

@justalexwortega

a month ago

thumb_up_off_alt1

chat_bubble_outline0

repeat2

shareShare