Georgi Gerganov (@ggerganov) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

ggml

@ggml_org

3 months ago

thumb_up_off_alt105

chat_bubble_outline2

repeat10

shareShare

Georgi Gerganov

@ggerganov

3 months ago

New embedding mixture-of-experts model by Nomic AI

thumb_up_off_alt127

chat_bubble_outline1

repeat10

shareShare

We will release the quantized models of Qwen3 to you in the following days. Today we release the AWQ and GGUFs of Qwen3-14B and Qwen3-32B, which enables using the models with limited GPU memory. Qwen3-32B-AWQ: huggingface.co/Qwen/Qwen3-32B… Qwen3-32B-GGUF: huggingface.co/Qwen/Qwen3-32B…

thumb_up_off_alt1,1K

chat_bubble_outline61

repeat217

shareShare

Simon Willison

@simonw

3 months ago

llama.cpp shipped new support for vision models this morning, including macOS binaries (albeit quarantined so you have to take extra steps to run them) that let you run vision models in a terminal or as a localhost web UI

thumb_up_off_alt263

chat_bubble_outline8

repeat37

shareShare

Georgi Gerganov

@ggerganov

3 months ago

Son has been doing an outstanding job at maintaining the llama-server implementation and now bringing full-blown vision input support to llama.cpp! Massive kudos and thanks for your valuable contributions to the project!

thumb_up_off_alt496

chat_bubble_outline9

repeat46

shareShare

Julien Chaumond

@julien_c

3 months ago

llama.cpp is now fully compatible with VLMs 💥 HUGE kudos to Xuan-Son Nguyen from HF and to the ggml team 💟 Here are a selection of pre-quantized models, ready to be used, from: - Google DeepMind Gemma - Mistral AI Pixtral - Qwen VL - Hugging Face SmolVLM Give them a

llama.cpp is now fully compatible with VLMs 💥

HUGE kudos to <a href="/ngxson/">Xuan-Son Nguyen</a> from HF and to the <a href="/ggml_org/">ggml</a> team 💟

Here are a selection of pre-quantized models, ready to be used, from:
- <a href="/GoogleDeepMind/">Google DeepMind</a> Gemma
- <a href="/MistralAI/">Mistral AI</a> Pixtral
- <a href="/Alibaba_Qwen/">Qwen</a> VL
- <a href="/huggingface/">Hugging Face</a> SmolVLM

Give them a

thumb_up_off_alt315

chat_bubble_outline2

repeat59

shareShare

clem 🤗

@clementdelangue

3 months ago

Llama.cpp now supports vision models thanks to Xuan-Son Nguyen Georgi Gerganov! On-device multi-modal🔥🔥🔥

Llama.cpp now supports vision models thanks to <a href="/ngxson/">Xuan-Son Nguyen</a> <a href="/ggerganov/">Georgi Gerganov</a>! On-device multi-modal🔥🔥🔥

thumb_up_off_alt383

chat_bubble_outline7

repeat46

shareShare

Xuan-Son Nguyen

@ngxson

3 months ago

Real-time webcam demo with Hugging Face SmolVLM and ggml llama.cpp server. All running locally on a Macbook M3

thumb_up_off_alt11,11K

chat_bubble_outline209

repeat1,1K

shareShare

Georgi Gerganov

@ggerganov

3 months ago

👀

thumb_up_off_alt189

chat_bubble_outline6

repeat16

shareShare

ggml

@ggml_org

3 months ago

Deploy vision models with llama.cpp on Hugging Face

thumb_up_off_alt111

chat_bubble_outline1

repeat8

shareShare

Georgi Gerganov

@ggerganov

2 months ago

PSA for applications that use local AI models - here is how to do it right: More and more applications are adding support for local AI models, which is great. But I notice that they are doing it the wrong way (see the screenshots below). The right way to do it is to add a

thumb_up_off_alt224

chat_bubble_outline12

repeat26

shareShare

Xuan-Son Nguyen

@ngxson

2 months ago

Summarize latest Fireship's video using cutting edge ggml's llama.cpp audio support (model: Ultravox + Llama 3.1 8B)

Summarize latest <a href="/fireship_dev/">Fireship</a>'s video using cutting edge <a href="/ggml_org/">ggml</a>'s llama.cpp audio support (model: Ultravox + Llama 3.1 8B)

thumb_up_off_alt36

chat_bubble_outline2

repeat5

shareShare

Olivier Chafik

@ochafik

2 months ago

llama.cpp streaming support for tool calling & thoughts was just merged: please test & report any issues 😅 github.com/ggml-org/llama… #llamacpp

thumb_up_off_alt64

chat_bubble_outline1

repeat6

shareShare

Simon Willison

@simonw

2 months ago

llm-llama-server now supports tools, which means this local Gemma demo should work (if you have 3.2GB free): brew install llama.cpp llama-server --jinja -hf unsloth/gemma-3-4b-it-GGUF:Q4_K_XL uvx --with llm-llama-server llm -m llama-server-tools -T llm_time 'what time is it?'

thumb_up_off_alt182

chat_bubble_outline4

repeat15

shareShare

PlayAI

@playaiofficial

2 months ago

🎙️ After serving millions of users through our text-to-speech platform, one need kept coming up: fine-grained AI speech editing - the ability to modify existing speech. Today, we’re open-sourcing PlayDiffusion, a diffusion-based inpainting model built for that exact purpose.