Yotam Perlitz 👾 (@yotamperlitz) Twitter Tweets • TwiCopy

Yotam Perlitz 👾

@yotamperlitz

+ Follow

Research Scientist at @ibmresearch, Practicing #NLProc, #RL.
Opinions are my own.

ID: 2959595173

linkhttps://perlitz.github.io/ calendar_today05-01-2015 11:25:30

95 Tweet

84 Followers

126 Following

Uri Berger

@uriberger88

a year ago

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ Gabriel Stanovsky Omri Abend Lea Frermann >

$1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ <a href="/GabiStanovsky/">Gabriel Stanovsky</a> <a href="/AbendOmri/">Omri Abend</a> <a href="/leafrermann/">Lea Frermann</a> >$

thumb_up_off_alt19

chat_bubble_outline3

repeat10

shareShare

Yotam Perlitz 👾

@yotamperlitz

a year ago

Shoutout to Streamlit, our framework of choice! Shoutout to Hugging Face for hosting our space 🤗

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Yotam Perlitz 👾

@yotamperlitz

a year ago

Me trying to choose the right LLM benchmark without BenchBench: x.com/yotamperlitz/s…

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Yotam Perlitz 👾

@yotamperlitz

a year ago

Get your benchmark game on: huggingface.co/spaces/ibm/ben…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Yufang Hou

@yufanghou

a year ago

It's been a great collaboration journey with my wonderful co-authors: Andreas Waldis, Yotam Perlitz 👾 Leshem (Legend) Choshen 🤖🤗, and Iryna Gurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Leshem Choshen C U @ ICLR 🤖🤗

@lchoshen

a year ago

Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:

thumb_up_off_alt248

chat_bubble_outline4

repeat36

shareShare

Elron Bandel

@elronbandel

a year ago

Can lowercase really make or break an AI’s answer? 🤔 And what happens when an LLM ‘sees’ old TV-style white noise? 📺 We put these quirks to the test using Unitxt, showcasing the powerful, flexible testing it enables. Read on: unitxt.ai/en/latest/blog… #AI #MachineLearning #LLM

thumb_up_off_alt12

chat_bubble_outline1

repeat5

shareShare

Philipp Schmid

@_philschmid

a year ago

RAG Developer Attention! 🔔 Docling is a new library from IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with LlamaIndex 🦙 and LangChain. TL;DR: 🗂️ Parses numerous

RAG Developer Attention! 🔔 Docling is a new library from <a href="/IBM/">IBM</a> that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with <a href="/llama_index/">LlamaIndex 🦙</a> and <a href="/LangChainAI/">LangChain</a>.

TL;DR:
🗂️ Parses numerous

thumb_up_off_alt491

chat_bubble_outline7

repeat85

shareShare

UKP Lab

@ukplab

a year ago

Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵 💠Meta-study of current literature 💠Coverage of LMs and phenomena 💠Analysis of LM size, architecture, and instruction tuning holmes-benchmark.github.io

thumb_up_off_alt33

chat_bubble_outline1

repeat8

shareShare

LayerLens

@layerlens_ai

a year ago

Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from IBM Research introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569

thumb_up_off_alt9

chat_bubble_outline1

repeat4

shareShare

Elron Bandel

@elronbandel

9 months ago

A big thank you to the Unitxt team, collaborators, and our community for an incredible 2024! Together, we pushed boundaries in AI evaluation and set new standards for the field. Read our End-of-Year Summary here: unitxt.ai/en/latest/blog… #unitxt #llmevaluation

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Yotam Perlitz 👾

@yotamperlitz

8 months ago

What a GEM!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare