
Yotam Perlitz 👾
@yotamperlitz
Research Scientist at @ibmresearch, Practicing #NLProc, #RL.
Opinions are my own.
ID: 2959595173
https://perlitz.github.io/ 05-01-2015 11:25:30
95 Tweet
84 Followers
126 Following

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ Gabriel Stanovsky Omri Abend Lea Frermann >





It's been a great collaboration journey with my wonderful co-authors: Andreas Waldis, Yotam Perlitz 👾 Leshem (Legend) Choshen 🤖🤗, and Iryna Gurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.





Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from IBM Research introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569

