Elyas Obbad (@obbadelyas) Twitter Tweets • TwiCopy

Angelina Wang @angelinawang.bsky.social

8 months ago

I've recently put together a "Fairness FAQ": tinyurl.com/fairness-faq. If you work in non-fairness ML and you've heard about fairness, perhaps you've wondered things like what the best definitions of fairness are, and whether we can train algorithms that optimize for it.

thumb_up_off_alt116

chat_bubble_outline3

repeat22

shareShare

ADAM BADΞR ᯅ

@adambader

6 months ago

Whoop, a company that always claimed that subscribers would receive new features without hardware updates, and that if new hardware was released, subscribers would receive it for free, has today released two new devices and is asking existing subscribers to pay for them. This is

thumb_up_off_alt1,1K

chat_bubble_outline173

repeat93

shareShare

Virtue AI

@virtueai_co

6 months ago

🚀 Introducing VirtueAgent, the first security layer for the agentic era. As AI agents begin to act autonomously in real-world environments, such as personal assistants, finance, healthcare, ensuring they operate securely and compliant is critical. VirtueAgent provides

thumb_up_off_alt21

chat_bubble_outline2

repeat6

shareShare

David Hall

@dlwh

5 months ago

Super excited Marin is finally out! Come see what we've been building! Code/platform for training fully reproducible models end-to-end, from data to evals. Plus a new high quality 8B base model, fully documented from start to finish.

thumb_up_off_alt63

chat_bubble_outline10

repeat16

shareShare

Brando Miranda

@brandohablando

5 months ago

One of our newest pre-training projects was built with Marin! Stay tuned for more soon! Thanks for Elyas Obbad & David Hall for being so fun to work with -- and Percy Liang help test Marin & Sanmi Koyejo really good kind advice. & Rylan Schaeffer for his very efficient feedback ;)

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Elyas Obbad

@obbadelyas

5 months ago

Marin is amazing! It allows for friendly and reproducible AI research and has enabled our team to rapidly test out different ideas. Great community as well (: cc Brando Miranda David Hall Rylan Schaeffer Percy Liang Sanmi Koyejo

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Katie Everett

@_katieeverett

5 months ago

1. We often observe power laws between loss and compute: loss = a * flops ^ b + c 2. Models are rapidly becoming more efficient, i.e. use less compute to reach the same loss But: which innovations actually change the exponent in the power law (b) vs change only the constant (a)?

thumb_up_off_alt254

chat_bubble_outline8

repeat44

shareShare

Rylan Schaeffer

@rylanschaeffer

5 months ago

A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 🎉🎉🎉 TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to

thumb_up_off_alt106

chat_bubble_outline4

repeat14

shareShare

Rylan Schaeffer

@rylanschaeffer

5 months ago

🚨New preprint 🚨 Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims 1/8

thumb_up_off_alt285

chat_bubble_outline12

repeat35

shareShare

David Hall

@dlwh

4 months ago

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )

thumb_up_off_alt968

chat_bubble_outline21

repeat94

shareShare

Rylan Schaeffer

@rylanschaeffer

4 months ago

Another #ICML2025 paper! Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? TLDR: Predicting language model performance with scale on multiple choice question-answer (MCQA) benchmarks is made difficult b/c ... 1/3

thumb_up_off_alt88

chat_bubble_outline2

repeat16

shareShare

Rylan Schaeffer

@rylanschaeffer

4 months ago

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models? Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄 Joshua Kazdan Apratim Dey Matthias Gerstgrasser Rafael Rafailov @ NeurIPS Sanmi Koyejo 1/7

thumb_up_off_alt107

chat_bubble_outline4

repeat20

shareShare

Rylan Schaeffer

@rylanschaeffer

4 months ago

New position paper! Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Joint w/ Sanmi Koyejo Joshua Kazdan Yegor Denisov-Blanch Francesco Orabona Koustuv Sinha Jessica Zosa Forde Jesse Dodge Susan Zhang Brando Miranda Matthias Gerstgrasser isha Elyas Obbad 1/6

New position paper! Machine Learning Conferences Should Establish a “Refutations and Critiques” Track

Joint w/ <a href="/sanmikoyejo/">Sanmi Koyejo</a> <a href="/JoshuaK92829/">Joshua Kazdan</a> <a href="/yegordb/">Yegor Denisov-Blanch</a> <a href="/bremen79/">Francesco Orabona</a> <a href="/koustuvsinha/">Koustuv Sinha</a> <a href="/in4dmatics/">Jessica Zosa Forde</a> <a href="/JesseDodge/">Jesse Dodge</a> <a href="/suchenzang/">Susan Zhang</a> <a href="/BrandoHablando/">Brando Miranda</a> <a href="/MGerstgrasser/">Matthias Gerstgrasser</a> <a href="/is_h_a/">isha</a> <a href="/ObbadElyas/">Elyas Obbad</a>

1/6

thumb_up_off_alt400

chat_bubble_outline12

repeat49

shareShare

Brando Miranda

@brandohablando

4 months ago

I'm in ICML too! Code and math -- especially ai for formal methods and lean!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Brando Miranda

@brandohablando

4 months ago

We use Trace in our new VeriBench benchmark for code verification in Lean 4 -- stay tuned for more details soon! Openreview: openreview.net/forum?id=rWkGF… should be public soon, ICML MATH-AI workshop, come Friday to chat with me about this and AI for correct and verified code! And AI

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Brando Miranda

@brandohablando

4 months ago

🚨 Can your LLM really do math—or is it cramming the test set? 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 1. openreview.net/forum?id=kqj2C… 2. icml.cc/virtual/2025/p… #ICML2025 East Exhibition Hall A-B, #E-2502 🧵1/14

thumb_up_off_alt60

chat_bubble_outline4

repeat19

shareShare

Brando Miranda

@brandohablando

4 months ago

Saurabh Srivastava Christian Szegedy Yuhuai (Tony) Wu Albert Jiang Tweet 17 / 14 Thank you for the wonderful team! Aryan Gulati Sanmi Koyejo Stanford Trustworthy AI Research (STAIR) Lab Kai Fronsdal Elyas Obbad Emily, Bruno, @3ricme (we will post their handles soon!) 🧵17/14

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare

Elyas Obbad

@obbadelyas

3 months ago

Check out our work on verifiable code generation! This is an extremely promising research direction and I’m excited to see it gain traction

thumb_up_off_alt6

chat_bubble_outline2

repeat0

shareShare

Brando Miranda

@brandohablando

3 months ago

Come to Convention Center West room 208-209 2nd floor to learn about optimal data selection using compression like gzip! tldr; you can learn much faster if you use gzip compression distances to select data given a task! DM if you are interested or what to use the code!

thumb_up_off_alt7

chat_bubble_outline0

repeat4

shareShare

Rylan Schaeffer

@rylanschaeffer

3 months ago

The hardest scaling law prediction problems are no match for Ying Xiao armed only with a spreadsheet and his measuring stick

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare