Dan Deutsch (@_danieldeutsch) Twitter Tweets • TwiCopy

Dan Deutsch

@_danieldeutsch

+ Follow

Research Scientist at Google Translate working on text generation evaluation

ID: 821649037

linkhttps://danieldeutsch.github.io/ calendar_today13-09-2012 14:52:58

89 Tweet

611 Followers

89 Following

Dan Deutsch

@_danieldeutsch

a year ago

New application link! google.com/about/careers/… I am at EMNLP/WMT this week. Please come find me if you want to learn more about this role!

thumb_up_off_alt35

chat_bubble_outline0

repeat10

shareShare

The Google Translate Research Team is looking for interns this summer! Apply here if you will graduate from a PhD program in the 2025-2026 academic year, and send me an email to let me know that you applied google.com/about/careers/…

thumb_up_off_alt185

chat_bubble_outline3

repeat51

shareShare

Dan Deutsch

@_danieldeutsch

a year ago

Super simple and effective way of significantly increasing the performance of your evaluation metric!

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Jurik Juraska

@jurikjuraska

a year ago

🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now open-source in PyTorch/Transformers! 🎉 Ready to take this top performer in the WMT24 Metrics Shared Task for a spin? 🔗 Code: github.com/google-researc…

thumb_up_off_alt18

chat_bubble_outline1

repeat6

shareShare

Jurik Juraska

@jurikjuraska

a year ago

🚀 We have just released bfloat16 variants of all 3 MetricX-24 models, offering nearly identical performance to their float32 counterparts, but with a 50% smaller memory footprint. ✨ We hope this makes the XL and XXL models more accessible! 🔗 GitHub: github.com/google-researc…

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Yusuf Kocyigit

@mykocyigit

10 months ago

Thrilled to share our latest findings on data contamination, from my internship at Google! We trained almost 90 Models on 1B and 8B scales with various contamination types using machine translation as our task and analyze the impact of contamination. arxiv.org/abs/2501.18771

thumb_up_off_alt83

chat_bubble_outline3

repeat19

shareShare

iseeaswell꩜bʂky

@iseeaswell

9 months ago

😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301 Huggingface: huggingface.co/datasets/googl…

thumb_up_off_alt32

chat_bubble_outline2

repeat11

shareShare

Markus Freitag

@markuseful

9 months ago

Two new datasets from Google Translate targeting high and low resource languages! WMT24++: 46 new en->xx languages to WMT24, bringing the total to 55 SMOL: 6M tokens for 115 very low-resource languages WMT24++: huggingface.co/datasets/googl… SMOL: huggingface.co/datasets/googl…

thumb_up_off_alt85

chat_bubble_outline2

repeat25

shareShare

iseeaswell꩜bʂky

@iseeaswell

5 months ago

Working on Low Resource Languages? Want to help with SMOL? join our new discord! discord.gg/YFTv7tkh

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare