Prompsit (@prompsit) 's Twitter Profile
Prompsit

@prompsit

We speak Natural Language Processing, Data Analysis and Artificial Intelligence, among many other languages!

ID: 326020797

linkhttp://prompsit.com calendar_today29-06-2011 07:22:39

2,2K Tweet

598 Followers

413 Following

HPLT (@hplt_eu) 's Twitter Profile Photo

Interested in Open and Community-Driven MT initiatives? CrowdMT is for you! 🎙️Invited speakers from Wikimedia Foundation and Apertium announced. 📜Accepted papers and abstracts announced. Time to register at events.tuni.fi/eamt23/registr… Details: hplt-project.org/events

Prompsit (@prompsit) 's Twitter Profile Photo

Language varieties were added to #MaCoCu data sets as metadata, so you can do your own filtering! More details on metadata given at each corpus card description in Clarin.si repo, e.g., Croatian-English: clarin.si/repository/xml… All corpora listed at macocu.eu

HPLT (@hplt_eu) 's Twitter Profile Photo

Next June, 17th-25th, the #HPLT consortium will held a #hackathon around a set of topics related to corpora curation in Prague. Interested? Drop us a line and join! hplt-project.org/hackathon2023

Taja Kuzman (@tajakuzman) 's Twitter Profile Photo

#MaCoCu crew is in Groningen these days! Walking towards great results of MaCoCu corpora evaluation and new MaCoCu language models for under-resourced languages 😁

#MaCoCu crew is in Groningen these days! Walking towards great results of MaCoCu corpora evaluation and new MaCoCu language models for under-resourced languages 😁
Clarin.si (@clarinslovenia) 's Twitter Profile Photo

We are excited to share with you that we now provide 4 more massive monolingual corpora for under-resourced languages: you can access Icelandic, Ukrainian, Catalan and Greek #MaCoCu web corpora for free from the CLARIN.SI repository 😃

We are excited to share with you that we now provide 4 more massive monolingual corpora for under-resourced languages: you can access Icelandic, Ukrainian, Catalan and Greek #MaCoCu web corpora for free from the CLARIN.SI repository 😃
Prompsit (@prompsit) 's Twitter Profile Photo

Select, filter, visualize your data (OpusCleaner). Then schedule and train MT and LLMs consistently (OpusTrainer) with them. As part of the HPLT project, we build tools to make it easy. They are open-source and we encourage you to use them. More:

HPLT (@hplt_eu) 's Twitter Profile Photo

We just published version 1.2 of HPLT datasets. What's new? - we fixed a bug in monolingual dedup, please redownload! 🛠️ - we filtered out very ugly monolingual documents🤮 - we anonymised the bilingual datasets🕵️‍♀️ hplt-project.org/datasets/v1.2

Prompsit (@prompsit) 's Twitter Profile Photo

Hoy cumplimos 18 años haciendo lo que más nos gusta en este cruce entre lenguas y tecnología. Gracias por vuestra confianza. Per molts anys Prompsit! Gràcies de tot cor pel vostre suport! Happy birthday to us! 🥳 Thanks for your trust, we'll keep doing our best!

HPLT (@hplt_eu) 's Twitter Profile Photo

First datasets, then models! Initial HPLT models (LLMs and MT) are out: hplt-project.org/models, some still running 🏃 We explain what we are doing in the deliverables section: hplt-project.org/deliverables Meanwhile, we keep cooking IA peta-data-bytes 🥘, enriching, dashboarding 📊

Parque Científico UMH (@pcientificoumh) 's Twitter Profile Photo

➡️ La empresa del #ParqueCientífico de la UMH, Prompsit, colabora en un proyecto europeo sobre tecnologías del lenguaje de alto rendimiento con el objetivo de crear diferentes modelos de lenguaje y traducciones potentes. Noticia completa 📌: parquecientificoumh.es/noticias/promp…

➡️ La empresa del #ParqueCientífico de la <a href="/UniversidadMH/">UMH</a>, <a href="/Prompsit/">Prompsit</a>, colabora en un proyecto europeo sobre tecnologías del lenguaje de alto rendimiento con el objetivo de crear diferentes modelos de lenguaje y traducciones potentes.

Noticia completa 📌: 
parquecientificoumh.es/noticias/promp…
Rik van Noord (@rikvannoord) 's Twitter Profile Photo

Happy to share our latest MaCoCu paper, accepted at #LRECCOLING2024 LREC COLING 2024 #NLProc 🎉 We have linguists annotate the data *quality* of 4 well-known monolingual corpora (OSCAR, CC100, mC4 and MaCoCu) across 11 European low-resource languages. Link: arxiv.org/pdf/2403.08693…

Happy to share our latest MaCoCu paper, accepted at #LRECCOLING2024 <a href="/LrecColing/">LREC COLING 2024</a>  #NLProc 🎉

We have linguists annotate the data *quality* of 4 well-known monolingual corpora (OSCAR, CC100, mC4 and MaCoCu) across 11 European low-resource languages.

Link: arxiv.org/pdf/2403.08693…
Slator (@slatornews) 's Twitter Profile Photo

By harnessing web crawls 🕸️ from Internet Archive and CommonCrawl, researchers 🔎 from The University of Edinburgh, University of Helsinki, Universitetet i Oslo, Turun yliopisto - University of Turku, and Prompsit unveil new #language resources aimed at enhancing language modeling and #MT training. slator.ch/MassiveMultili… Ona de Gibert

Prompsit (@prompsit) 's Twitter Profile Photo

Fue un gusto participar en esta jornada. Gracias por la invitación Parque Científico UMH, nos gustó mucho compartir la jornada con las compañeras de Prospera Biotech. ¡Tenemos unas científicas y tecnólogas excepcionales a la vuelta de cada esquina! 👩‍🔬👩‍💻💪🦾

HPLT (@hplt_eu) 's Twitter Profile Photo

We are happy to announce the second release of HPLT bilingual datasets: - 50 English-centric language pairs = 380M parallel sentences (HPLT) 🤩 - 1,275 non-English-centric language pairs = 16.7B parallel sentences (MultiHPLT) 😮 Available at the HPLT dataset catalogue and OPUS.

Prompsit (@prompsit) 's Twitter Profile Photo

Prompsit will actively participate in OpenEuroLLM by analysing and curating the open data needed to train the foundational LLM. We are also contributing to multilingual LLM evaluation and dissemination of it all!

Prompsit (@prompsit) 's Twitter Profile Photo

We had a great time at MTSummit2025 presenting work about HPLT v2 multilingual datasets (v3 coming soon!) and ProMut, an improved DYI platform to teach and learn about MT. Great to be there also to celebrate the Award of Honour to our co-founder, CRO and friend Mikel Forcada! 😍

We had a great time at <a href="/MTSummit2025/">MTSummit2025</a> presenting work about HPLT v2 multilingual datasets (v3 coming soon!) and ProMut, an improved DYI platform to teach and learn about MT. Great to be there also to celebrate the Award of Honour to our co-founder, CRO and friend Mikel Forcada! 😍
Prompsit (@prompsit) 's Twitter Profile Photo

Impossible oblidar el dia que vam conèixer a l'Olga Torres, aquell somriure que va fer de MultiTrainMT molt més que un projecte d'èxit quant als resultats: va fer pinya, va fer família. Eixe somriure ens acompanyarà sempre, DEP benvolguda amiga.