Catherine Arnett (@linguist_cat) Twitter Tweets • TwiCopy

Catherine Arnett

@linguist_cat

+ Follow

NLP Researcher @AiEleuther. PhD @UCSanDiego Linguistics.
Previously @pleiasfr @EdinburghUni. Interested in multilingual NLP, tokenizers, open science. She/her.

ID: 1532493362296606720

linkhttps://catherinearnett.github.io/ calendar_today02-06-2022 22:44:43

124 Tweet

533 Followers

455 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Catherine Arnett

@linguist_cat

a month ago

If you are into tokenization and have some bandwidth this week, we need more reviewers!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2

thumb_up_off_alt556

chat_bubble_outline10

repeat127

shareShare

EleutherAI

@aieleuther

23 days ago

We are launching a new speaker series at EleutherAI, focused on promoting recent research by our team and community members. Our first talk is by Catherine Arnett on tokenizers, their limitations, and how to improve them.

We are launching a new speaker series at EleutherAI, focused on promoting recent research by our team and community members.

Our first talk is by <a href="/linguist_cat/">Catherine Arnett</a> on tokenizers, their limitations, and how to improve them.

thumb_up_off_alt146

chat_bubble_outline2

repeat21

shareShare