José Maria Pombal (@zmprcp) 's Twitter Profile
José Maria Pombal

@zmprcp

Research Scientist @unbabel, PhD student @istecnico.

ID: 1633224223454797826

linkhttp://zeppombal.github.io calendar_today07-03-2023 21:53:17

53 Tweet

81 Followers

103 Following

Andre Martins (@andre_t_martins) 's Twitter Profile Photo

Good to see European Commission promoting OS LLMs in Europe. However (1) "OpenEuroLLM" is appropriating a name (#EuroLLM) which already exists, (2) it is certainly *not* the "first family of open-source LLMs covering all EU languages" 🧵

Duarte Alves (@duartemralves) 's Twitter Profile Photo

🚀 Excited to announce EuroBERT: a new multilingual encoder model family for European & global languages! 🌍 🔹 EuroBERT is trained on a massive 5 trillion-token dataset across 15 languages and includes recent architecture advances such as GQA, RoPE & RMSNorm. 🔹

🚀 Excited to announce EuroBERT: a new multilingual encoder model family for European & global languages! 🌍

🔹 EuroBERT is trained on a massive 5 trillion-token dataset across 15 languages and includes recent architecture advances such as GQA, RoPE & RMSNorm. 🔹
Andrea Piergentili (@apierg) 's Twitter Profile Photo

Brilliant and necessary work by José Maria Pombal et al. about metric interference in MT system development and evaluation: arxiv.org/abs/2503.08327 Are we developing better systems or are we just gaming the metrics? And how do we address this? Super (m)interesting! 👀

MT Group at FBK (@fbk_mt) 's Twitter Profile Photo

Our pick of the week by Andrea Piergentili: "Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation" by José Pombal, Nuno M. Guerreiro, Ricardo Rei, and Andre Martins (2025). #mt #translation #metric #machinetranslation

Slator (@slatornews) 's Twitter Profile Photo

.Unbabel exposes 🔎 how using the same metrics for both training and evaluation can create misleading ⚠️ #machinetranslation performance estimates and proposes how to solve this with MINTADJUST. José Maria Pombal Ricardo Rei Andre Martins #translation #xl8 #MT slator.ch/UnbabelBiasAIT…

Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

Here's our new paper on m-Prometheus, a series of multulingual judges! 1/ Effective at safety & translation eval 2/ Also stands out as a good reward model in BoN 3/ Backbone model selection & training on natively multilingual data is important Check out José Maria Pombal 's post!

Dongkeun Yoon (@dongkeun_yoon) 's Twitter Profile Photo

Introducing M-Prometheus — the latest iteration of the open LLM judge, Prometheus! Specially trained for multilingual evaluation. Excels across diverse settings, including the challenging task of literary translation assessment.

Patrick Fernandes (@psanfernandes) 's Twitter Profile Photo

MT metrics excel at evaluating sentence translations, but struggle with complex texts We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them arxiv.org/abs/2504.07583 (co-lead Sweta Agrawal) 1/15

MT metrics excel at evaluating sentence translations, but struggle with complex texts

We  introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them

arxiv.org/abs/2504.07583

(co-lead <a href="/swetaagrawal20/">Sweta Agrawal</a>)

1/15
Dongkeun Yoon (@dongkeun_yoon) 's Twitter Profile Photo

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

🙁 LLMs are overconfident even when they are dead wrong.

🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”?

❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.