Tom Jobbins (@theblokeai) Twitter Tweets • TwiCopy

Tom Jobbins

2 years ago

Transformers 4.32.0 now supports GPTQ models natively! Over the last couple of days I have updated 296 of my GPTQ repos to provide automatic support for this. It's awesome you can now load a GPTQ model directly in Transformers with only two lines of code!

thumb_up_off_alt272

chat_bubble_outline9

repeat55

shareShare

Tom Jobbins

@theblokeai

2 years ago

Meta's CodeLlama is here! ai.meta.com/blog/code-llam… 7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python First time we've seen the 34B model I've got a couple of fp16s up: huggingface.co/TheBloke/CodeL… huggingface.co/TheBloke/CodeL… More coming soon obvs

thumb_up_off_alt338

chat_bubble_outline17

repeat57

shareShare

Tom Jobbins

@theblokeai

2 years ago

Just released by Tav : Pygmalion 2, the sequel to one of the most popular models ever! And Mythalion, a new Gryphe merge! huggingface.co/TheBloke/Pygma… huggingface.co/TheBloke/Pygma… huggingface.co/TheBloke/Pygma… huggingface.co/TheBloke/Pygma… huggingface.co/TheBloke/Mytha… huggingface.co/TheBloke/Mytha…

thumb_up_off_alt98

chat_bubble_outline4

repeat11

shareShare

Elinas

@officialelinas

2 years ago

Chronos 70B v2 release! Thanks to Pygmalion for generously providing the compute and Tom Jobbins for quantizing the model. As usual, the model optimized for chat, roleplay, storywriting, and now includes vastly improved reasoning skills. huggingface.co/elinas/chronos…

thumb_up_off_alt41

chat_bubble_outline4

repeat18

shareShare

Georgi Gerganov

@ggerganov

2 years ago

Casually running a 180B parameter LLM on M2 Ultra

thumb_up_off_alt3,3K

chat_bubble_outline76

repeat384

shareShare

Bertrand Chevrier

@kramp

2 years ago

This new filter 🔎 on Hugging Face user's profile is very helpful, especially to check if Tom Jobbins has quantized and released the last trending models 😁

thumb_up_off_alt54

chat_bubble_outline4

repeat6

shareShare

Tom Jobbins

@theblokeai

2 years ago

This is fantastic! Git clone was already dead for HF as far as I was concerned - I had my own hf_upload.py and hf_download.py scripts (wrapping HfAPI) for fast, efficient transfers. But huggingface_hub v0.17 makes those redundant! I will be using this now. Awesome stuff,🤗

thumb_up_off_alt101

chat_bubble_outline2

repeat9

shareShare

Chirper

@chirperai

2 years ago

Have you heard about Chirper worlds? 👀🌐

thumb_up_off_alt27

chat_bubble_outline3

repeat8

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 years ago

🔥Excited to introduce LMSYS-Chat-1M, a large-scale dataset of 1M real-world conversations with 25 cutting-edge LLMs! This dataset, collected from chat.lmsys.org, offers insights into user interactions with LLMs and intriguing use cases. Link: huggingface.co/datasets/lmsys…

thumb_up_off_alt368

chat_bubble_outline9

repeat88

shareShare

younes

@younesbelkada

2 years ago

New feature alert in the Hugging Face ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!

New feature alert in the <a href="/huggingface/">Hugging Face</a> ecosystem!

Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8)

First pip install flash attention and pass use_flash_attention_2=True when loading the model!

thumb_up_off_alt511

chat_bubble_outline8

repeat102

shareShare

Tom Jobbins

@theblokeai

2 years ago

Thanks again to Latitude.sh for the loan of a beast 8xH100 server this week. I uploaded over 550 new repos, maybe my busiest week yet! Quanting is really resource intensive. Needs not only fast GPUs, but many CPUs, lots of disk, and 🚀 network. A server that ✅ all is v. rare!

thumb_up_off_alt242

chat_bubble_outline15

repeat16

shareShare

Julien Chaumond

@julien_c

2 years ago

oh hello Tom Jobbins I want to bookmark your 'Recent models' Collection on Hugging Face 🔥 Well... you can now upvote Collections! and browse upvoted collections on your profile ❤️

thumb_up_off_alt50

chat_bubble_outline2

repeat10

shareShare

Victor M

@victormustar

2 years ago

🤔 Are you interested in a "Follow" feature on the Hugging Face Hub? ➡️ This will allow you to see new models/records/spaces from users you follow.

thumb_up_off_alt105

chat_bubble_outline15

repeat11

shareShare

Tom Jobbins

@theblokeai

2 years ago

It's been awesome to see Transformers getting support for more and more quantisation methods. And I've loved collaborating with younes and Hugging Face again! All my AWQ uploads now support Transformers. READMEs will update soon to show a Transformers Python example.

thumb_up_off_alt155

chat_bubble_outline3

repeat25

shareShare

younes

@younesbelkada

2 years ago

Blazing fast text generation using AWQ and fused modules! 🚀 Up to 3x speedup compared to native fp16 that you can use right now on any models supported by Tom Jobbins Simply pass an `AwqConfig` with `do_fuse=True` to `from_pretrained` method! huggingface.co/docs/transform…

thumb_up_off_alt162

chat_bubble_outline6

repeat22

shareShare

Aleksa Gordić (水平问题)

@gordic_aleksa

2 years ago

Tom Jobbins joined me to share his work in the open-source AI space - don't miss it! happening right now server link: discord.gg/peBrCpheKE (see the general channel or events channel for google meet link)

<a href="/TheBlokeAI/">Tom Jobbins</a> joined me to share his work in the open-source AI space - don't miss it! happening right now

server link: discord.gg/peBrCpheKE

(see the general channel or events channel for google meet link)

thumb_up_off_alt23

chat_bubble_outline1

repeat1

shareShare

Tom Jobbins

@theblokeai

2 years ago

Transformers now supports Mixtral GPTQs and I've updated my READMEs accordingly. It was awesome working with Marc Sun and younes of Hugging Face on this! Credit to LaaZa for coding the AutoGPTQ quant and inference implementation which enabled me to get GPTQs out fast!

thumb_up_off_alt127

chat_bubble_outline14

repeat21

shareShare

emozilla

@theemozilla

2 years ago

FYI to anyone using Mistral AI's Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length) config.sliding_window = 32768

FYI to anyone using <a href="/MistralAI/">Mistral AI</a>'s Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length)

config.sliding_window = 32768

thumb_up_off_alt414

chat_bubble_outline15

repeat39

shareShare