Vilém Zouhar (@zouharvi) 's Twitter Profile
Vilém Zouhar

@zouharvi

PhD student @ ETH Zürich | all aspects of #NLProc but mostly HCI, evaluation and MT | go #vegan

ID: 2603766076

linkhttps://vilda.net calendar_today12-06-2014 12:44:21

1,1K Tweet

2,2K Followers

2,2K Following

Pinzhen "Patrick" Chen (@pinzhen_chen) 's Twitter Profile Photo

📢Participate in *WMT25 terminology task* to showcase how you customise translations! What's new? More languages, more domains, sent/doc-level, and Pareto optimal of term accuracy and overall quality. Don't miss it cuz it only happens once every two years. statmt.org/wmt25/terminol…

Gabriele Sarti (@gsarti_) 's Twitter Profile Photo

📢 New paper: Can unsupervised metrics extracted from MT models detect their translation errors reliably? Do annotators even *agree* on what constitutes an error? 🧐 We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings! 🧵 1/

📢 New paper: Can unsupervised metrics extracted from MT models detect their translation errors reliably? Do annotators even *agree* on what constitutes an error? 🧐

We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings!

🧵 1/
Gabriele Sarti (@gsarti_) 's Twitter Profile Photo

XCOMETs underperform because they do not match translators' subjective error annotation propensity. Using the granular p(error) value from XCOMET significantly boost their performance when calibration is possible → desirable for a fair evaluation 6/

XCOMETs underperform because they do not match translators' subjective error annotation propensity. Using the granular p(error) value from XCOMET significantly boost their performance when calibration is possible → desirable for a fair evaluation 6/
Gabriele Sarti (@gsarti_) 's Twitter Profile Photo

Key takeaways for WQE evals: 1️⃣ Unsup. WQE shows promise (esp. uncertainty-based ones), interp approaches under-explored for MT 2️⃣ Calibration sets can help to ensure fair evaluations. 3️⃣ Use multiple annotators for robust rakings. More info ➡️ arxiv.org/abs/2505.23183 8/8

Neil Renic (@nc_renic) 's Twitter Profile Photo

I am once again pitching my romantic comedy: - two academics start dating - discover they are each other's terrible reviewer - hijinks ensue Working title: Love is Double-Blind

Vilém Zouhar (@zouharvi) 's Twitter Profile Photo

For a long time I've been using Google Translate as a gateway to explain machine translation concepts to people as an easily recognizable tool that everyone knows. Now I get to contribute over the summer. 🌞 If you're near Mountain View, let's talk evaluation. 📏

For a long time I've been using Google Translate as a gateway to explain machine translation concepts to people as an easily recognizable tool that everyone knows. Now I get to contribute over the summer. 🌞

If you're near Mountain View, let's talk evaluation. 📏