Valentina Pyatkin (@valentina__py) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Excited to be at #NAACL2025 in Albuquerque this week! I'll be presenting "OLMES: A Standard for Language Model Evaluations" (arxiv.org/abs/2406.08446)! Work done with my wonderful collaborators at Ai2 ❤️

thumb_up_off_alt52

chat_bubble_outline2

repeat11

shareShare

Valentina Pyatkin

@valentina__py

3 months ago

Accepted to #ICML2025 ! 🥳

thumb_up_off_alt109

chat_bubble_outline0

repeat6

shareShare

Valentina Pyatkin

@valentina__py

2 months ago

Also accepted to #ICML2025 ! 🤩 See you in Vancouver this summer!

thumb_up_off_alt22

chat_bubble_outline0

repeat5

shareShare

Fazl Barez

@fazlbarez

2 months ago

Responsible Reviewing #NeurIPS2025 — TL;DR 1- If you/ your co-author skip your assigned reviews → you wont see your own paper’s reviews. 2- Submit a poor quality review → your paper may be desk‑rejected. 👏 Nice one, NeurIPS! 🔗 blog.neurips.cc/2025/05/02/res…

thumb_up_off_alt17

chat_bubble_outline1

repeat2

shareShare

Afra Amini

@afra_amini

2 months ago

Current KL estimation practices in RLHF can generate high variance and even negative values! We propose a provably better estimator that only takes a few lines of code to implement.🧵👇 w/ Tim Vieira and Ryan Cotterell code: arxiv.org/pdf/2504.10637 paper: github.com/rycolab/kl-rb

thumb_up_off_alt113

chat_bubble_outline4

repeat28

shareShare

Yanai Elazar

@yanaiela

2 months ago

Lucas Beyer (bl16) (((ل()(ل() 'yoav))))👾 rohan anil arxiv.org/abs/2410.15002 This is in the text-to-image domain, and we have some ideas on how to extend this to the text domain We also recently published this: arxiv.org/abs/2504.12459, which connects the number of entities co-occurrences, and "linearity" in model representations:

thumb_up_off_alt22

chat_bubble_outline1

repeat3

shareShare

Philippe Laban

@philippelaban

2 months ago

🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120

thumb_up_off_alt126

chat_bubble_outline5

repeat30

shareShare

Jing-Jing Li

@drjingjing2026

2 months ago

Excited to share that our SafetyAnalyst paper has been accepted to #icml2025! SafetyAnalyst provides a novel way to determine if some AI behavior would be safe. It’s accurate, interpretable, transparent, and steerable. 1/7

thumb_up_off_alt31

chat_bubble_outline1

repeat12

shareShare

Jonathan Bragg

@turingmusician

2 months ago

Try cheap model adapters for formatting outputs Don't assume model providers' structured output apis offer best performance

thumb_up_off_alt16

chat_bubble_outline3

repeat2

shareShare

Ai2

@allen_ai

a month ago

RewardBench 2 is here! We took a long time to learn from our first reward model evaluation tool to make one that is substantially harder and more correlated with both downstream RLHF and inference-time scaling.

thumb_up_off_alt136

chat_bubble_outline2

repeat21

shareShare

Sahil Verma

@sahil1v

a month ago

🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856

thumb_up_off_alt73

chat_bubble_outline1

repeat33

shareShare

Saumya Malik

@saumyamalik44

a month ago

I’m thrilled to share RewardBench 2 📊— We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!

thumb_up_off_alt218

chat_bubble_outline4

repeat46

shareShare

Saumya Malik

@saumyamalik44

a month ago

The paper is now up on ArXiv! arxiv.org/abs/2506.01937

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Eran Hirsch

@hirscheran

a month ago

🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)! LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2

thumb_up_off_alt72

chat_bubble_outline3

repeat26

shareShare

Jiacheng Liu

@liujc1998

a month ago

We enabled OLMoTrace for Tülu 3 models! 🤠 Matched spans are shorter than for OLMo models, bc we can only search in Tülu's post-training data (base model is Llama). Yet we thought it'd still bring some value. Try yourself on the Ai2 playground -- playground.allenai.org

thumb_up_off_alt42

chat_bubble_outline2

repeat12

shareShare

(((ل()(ل() 'yoav))))👾

@yoavgo

a month ago

i created a gist with some non-default LLM courses: gist.github.com/yoavg/95bbc576…

thumb_up_off_alt242

chat_bubble_outline8

repeat28

shareShare

Valentina Pyatkin

Gate.io

Yuling Gu

Valentina Pyatkin

Valentina Pyatkin

Fazl Barez

Afra Amini

Yanai Elazar

Philippe Laban

Jing-Jing Li

Jonathan Bragg

Ai2

Sahil Verma

Saumya Malik

Saumya Malik

Eran Hirsch

Jiacheng Liu

(((ل()(ل() 'yoav))))👾