Yotam Perlitz 👾 (@yotamperlitz) 's Twitter Profile
Yotam Perlitz 👾

@yotamperlitz

Research Scientist at @ibmresearch, Practicing #NLProc, #RL.
Opinions are my own.

ID: 2959595173

linkhttps://perlitz.github.io/ calendar_today05-01-2015 11:25:30

95 Tweet

84 Followers

126 Following

Uri Berger (@uriberger88) 's Twitter Profile Photo

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ Gabriel Stanovsky Omri Abend Lea Frermann >

1/ Into Image Captioning? Don’t miss this!
Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading?
Read our recent Captioning evaluation survey!

arxiv.org/abs/2408.04909
w\
<a href="/GabiStanovsky/">Gabriel Stanovsky</a>
<a href="/AbendOmri/">Omri Abend</a>
<a href="/leafrermann/">Lea Frermann</a>
&gt;
Yufang Hou (@yufanghou) 's Twitter Profile Photo

It's been a great collaboration journey with my wonderful co-authors: Andreas Waldis, Yotam Perlitz 👾 Leshem (Legend) Choshen 🤖🤗, and Iryna Gurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.

Leshem Choshen C U @ ICLR 🤖🤗 (@lchoshen) 's Twitter Profile Photo

Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:

Scaling laws predict🦣large models by training🦟small ones, cool right?
Fortunately, they are not that complicated or costly
at least they don't have to be

We have collected 400+ models
fitted 1000+ scaling laws
and created 1 guide
for cheap &amp; more reliable scaling law fitting:
Elron Bandel (@elronbandel) 's Twitter Profile Photo

Can lowercase really make or break an AI’s answer? 🤔 And what happens when an LLM ‘sees’ old TV-style white noise? 📺 We put these quirks to the test using Unitxt, showcasing the powerful, flexible testing it enables. Read on: unitxt.ai/en/latest/blog… #AI #MachineLearning #LLM

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

RAG Developer Attention! 🔔 Docling is a new library from IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with LlamaIndex 🦙 and LangChain. TL;DR: 🗂️ Parses numerous

RAG Developer Attention! 🔔 Docling is a new library from <a href="/IBM/">IBM</a> that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with <a href="/llama_index/">LlamaIndex 🦙</a> and <a href="/LangChainAI/">LangChain</a>.

TL;DR:
🗂️ Parses numerous
UKP Lab (@ukplab) 's Twitter Profile Photo

Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵 💠Meta-study of current literature 💠Coverage of LMs and phenomena 💠Analysis of LM size, architecture, and instruction tuning holmes-benchmark.github.io

Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵

💠Meta-study of current literature
💠Coverage of LMs and phenomena
💠Analysis of LM size, architecture, and instruction tuning

holmes-benchmark.github.io
LayerLens (@layerlens_ai) 's Twitter Profile Photo

Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from IBM Research introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569

Elron Bandel (@elronbandel) 's Twitter Profile Photo

A big thank you to the Unitxt team, collaborators, and our community for an incredible 2024! Together, we pushed boundaries in AI evaluation and set new standards for the field. Read our End-of-Year Summary here: unitxt.ai/en/latest/blog… #unitxt #llmevaluation