Vilém Zouhar (@zouharvi) Twitter Tweets • TwiCopy

Vilém Zouhar

@zouharvi

+ Follow

PhD student @ ETH Zürich | all aspects of #NLProc but mostly HCI, evaluation and MT | go #vegan

ID: 2603766076

linkhttps://vilda.net calendar_today12-06-2014 12:44:21

1,1K Tweet

2,2K Followers

2,2K Following

Vilém Zouhar

@zouharvi

8 months ago

Multilinguality is happening at #NAACL2025 Xinyu Crystina Zhang Nathan Brown @ NAACL 2025 Dayeon (Zoey) Ki Rena Gao Ona de Gibert

Multilinguality is happening at #NAACL2025

<a href="/crystina_z/">Xinyu Crystina Zhang</a> <a href="/OxxoTweets/">Nathan Brown @ NAACL 2025</a> <a href="/zoeykii/">Dayeon (Zoey) Ki</a> <a href="/weiweigao2222/">Rena Gao</a> <a href="/OnadeGibert/">Ona de Gibert</a>

thumb_up_off_alt56

chat_bubble_outline1

repeat4

shareShare

📢Participate in *WMT25 terminology task* to showcase how you customise translations! What's new? More languages, more domains, sent/doc-level, and Pareto optimal of term accuracy and overall quality. Don't miss it cuz it only happens once every two years. statmt.org/wmt25/terminol…

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

Gabriele Sarti

@gsarti_

7 months ago

📢 New paper: Can unsupervised metrics extracted from MT models detect their translation errors reliably? Do annotators even *agree* on what constitutes an error? 🧐 We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings! 🧵 1/

thumb_up_off_alt37

chat_bubble_outline1

repeat2

shareShare

Gabriele Sarti

@gsarti_

7 months ago

XCOMETs underperform because they do not match translators' subjective error annotation propensity. Using the granular p(error) value from XCOMET significantly boost their performance when calibration is possible → desirable for a fair evaluation 6/

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Gabriele Sarti

@gsarti_

7 months ago

Key takeaways for WQE evals: 1️⃣ Unsup. WQE shows promise (esp. uncertainty-based ones), interp approaches under-explored for MT 2️⃣ Calibration sets can help to ensure fair evaluations. 3️⃣ Use multiple annotators for robust rakings. More info ➡️ arxiv.org/abs/2505.23183 8/8

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Neil Renic

@nc_renic

6 months ago

I am once again pitching my romantic comedy: - two academics start dating - discover they are each other's terrible reviewer - hijinks ensue Working title: Love is Double-Blind

thumb_up_off_alt6,6K

chat_bubble_outline114

repeat529

shareShare

Vilém Zouhar

@zouharvi

6 months ago

For a long time I've been using Google Translate as a gateway to explain machine translation concepts to people as an easily recognizable tool that everyone knows. Now I get to contribute over the summer. 🌞 If you're near Mountain View, let's talk evaluation. 📏

thumb_up_off_alt75

chat_bubble_outline3

repeat0

shareShare

Vilém Zouhar

@zouharvi

6 months ago

Thank you for your response. I will keep my score.

thumb_up_off_alt36

chat_bubble_outline3

repeat0

shareShare

Vilém Zouhar

Vilém Zouhar

Pinzhen "Patrick" Chen

Gabriele Sarti

Gabriele Sarti

Gabriele Sarti

Neil Renic

Vilém Zouhar

Vilém Zouhar