OcciGlot (@occiglot) 's Twitter Profile
OcciGlot

@occiglot

Open Source Language Models for Europe

ID: 1757339431432822784

calendar_today13-02-2024 09:42:27

32 Tweet

232 Followers

12 Following

DFKI (@dfki) 's Twitter Profile Photo

OcciGlot - New Open Source Language Models for Europe released 🇪🇺 Researchers from DFKI and hessian.AI have launched the OcciGlot initiative to develop generative open source language models for European languages. 👉🏼 dfki.de/en/web/news/oc…

OcciGlot - New Open Source Language Models for Europe released 🇪🇺

Researchers from DFKI and <a href="/Hessian_AI/">hessian.AI</a>  have launched the <a href="/occiglot/">OcciGlot</a> initiative to develop generative open source language models for European languages.

👉🏼 dfki.de/en/web/news/oc…
Clémentine Fourrier 🍊 (@clefourrier) 's Twitter Profile Photo

New leaderboard: "Occiglot Euro LLM Leaderboard"! It evaluates the performance of LLMs on the following languages: 🇬🇧🇮🇹🇫🇷🇪🇸🇩🇪 huggingface.co/spaces/occiglo… It complements more specialised leaderboards well, congrats to the authors :)

Zilliz (@zilliz_universe) 's Twitter Profile Photo

Join Stephen and speakers from Amazon Web Services and OcciGlot at the Unstructured Data Meetup at the On Cloud Office in Berlin on June 25th. 🗣️ We'll have talks about Agentic RAG, Specialized Language Models and LLMs by and for Europe! 🇪🇺 bit.ly/4b3Jwyo

Hynek Kydlíček (@hkydlicek) 's Twitter Profile Photo

Great work by OcciGlot community on releasing new iteration of multilingual Oscar spanning 40 CC snapshots!! It's so important to work on high-quality multilingual data as 85% of the world are non-english speakers and they should be able to access AI. huggingface.co/datasets/oscar…

Manuel Brack (@mbrack_aiml) 's Twitter Profile Photo

We seek collaborators to extend Community OSCAR to the remaining Common Crawl dumps. If you have the compute/storage (or money to spend on AWS) to contribute, please get in touch with us. We have pre-prepared docker configs and scripts to help you get started easily.

Manuel Brack (@mbrack_aiml) 's Twitter Profile Photo

For anybody attending KonKis next week, let me make a quick add read for Session 4: "Large AI Models by and for Europe." felfri and lukas helff will be presenting some of our recent safety research. And xyou is giving a talk on OcciGlot. events.gwdg.de/event/615/sess…

OcciGlot (@occiglot) 's Twitter Profile Photo

📣Community Call Contribute to LLM pre-training resources in (your) unrepresented language! Please submit any websites in that language to Common Crawl Foundation's web language project. They will help increase non-english data in future releases. github.com/commoncrawl/we…

Manuel Brack (@mbrack_aiml) 's Twitter Profile Photo

🚀 New Preprint We introduce JQL: a highly efficient, modular pipeline for multilingual pre-training data curation. 📄 𝐀𝐫𝐗𝐢𝐯: arxiv.org/abs/2505.22232 🤗 𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞: huggingface.co/spaces/JQL-AI/… 🔧 𝐆𝐢𝐭𝐇𝐮𝐛: github.com/JQL-AI/JQL-Ann…