José Maria Pombal (@zmprcp) Twitter Tweets • TwiCopy

José Maria Pombal

@zmprcp

+ Follow

Research Scientist @unbabel, PhD student @istecnico.

ID: 1633224223454797826

linkhttp://zeppombal.github.io calendar_today07-03-2023 21:53:17

53 Tweet

81 Followers

103 Following

Andre Martins

@andre_t_martins

a year ago

Good to see European Commission promoting OS LLMs in Europe. However (1) "OpenEuroLLM" is appropriating a name (#EuroLLM) which already exists, (2) it is certainly *not* the "first family of open-source LLMs covering all EU languages" 🧵

thumb_up_off_alt47

chat_bubble_outline2

repeat13

shareShare

Duarte Alves

@duartemralves

9 months ago

🚀 Excited to announce EuroBERT: a new multilingual encoder model family for European & global languages! 🌍 🔹 EuroBERT is trained on a massive 5 trillion-token dataset across 15 languages and includes recent architecture advances such as GQA, RoPE & RMSNorm. 🔹

thumb_up_off_alt59

chat_bubble_outline1

repeat12

shareShare

Andrea Piergentili

@apierg

9 months ago

Brilliant and necessary work by José Maria Pombal et al. about metric interference in MT system development and evaluation: arxiv.org/abs/2503.08327 Are we developing better systems or are we just gaming the metrics? And how do we address this? Super (m)interesting! 👀

thumb_up_off_alt11

chat_bubble_outline1

repeat1

shareShare

MT Group at FBK

@fbk_mt

9 months ago

Our pick of the week by Andrea Piergentili: "Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation" by José Pombal, Nuno M. Guerreiro, Ricardo Rei, and Andre Martins (2025). #mt #translation #metric #machinetranslation

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Slator

@slatornews

9 months ago

.Unbabel exposes 🔎 how using the same metrics for both training and evaluation can create misleading ⚠️ #machinetranslation performance estimates and proposes how to solve this with MINTADJUST. José Maria Pombal Ricardo Rei Andre Martins #translation #xl8 #MT slator.ch/UnbabelBiasAIT…

thumb_up_off_alt4

chat_bubble_outline0

repeat3

shareShare

Seungone Kim @ NAACL2025

@seungonekim

8 months ago

Here's our new paper on m-Prometheus, a series of multulingual judges! 1/ Effective at safety & translation eval 2/ Also stands out as a good reward model in BoN 3/ Backbone model selection & training on natively multilingual data is important Check out José Maria Pombal 's post!

thumb_up_off_alt20

chat_bubble_outline0

repeat2

shareShare

Dongkeun Yoon

@dongkeun_yoon

8 months ago

Introducing M-Prometheus — the latest iteration of the open LLM judge, Prometheus! Specially trained for multilingual evaluation. Excels across diverse settings, including the challenging task of literary translation assessment.

thumb_up_off_alt22

chat_bubble_outline0

repeat3

shareShare

Patrick Fernandes

@psanfernandes

7 months ago

MT metrics excel at evaluating sentence translations, but struggle with complex texts We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them arxiv.org/abs/2504.07583 (co-lead Sweta Agrawal) 1/15

thumb_up_off_alt34

chat_bubble_outline2

repeat11

shareShare

Dongkeun Yoon

@dongkeun_yoon

7 months ago

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

thumb_up_off_alt298

chat_bubble_outline9

repeat50

shareShare