Elyas Obbad (@obbadelyas) 's Twitter Profile
Elyas Obbad

@obbadelyas

AI Research @STAI_Research @StanfordAILab | previously @DoorDash @Google @Meta @Columbia

ID: 1061392463615741952

calendar_today10-11-2018 22:57:37

76 Tweet

272 Followers

241 Following

Angelina Wang @angelinawang.bsky.social (@ang3linawang) 's Twitter Profile Photo

I've recently put together a "Fairness FAQ": tinyurl.com/fairness-faq. If you work in non-fairness ML and you've heard about fairness, perhaps you've wondered things like what the best definitions of fairness are, and whether we can train algorithms that optimize for it.

ADAM BADΞR ᯅ (@adambader) 's Twitter Profile Photo

Whoop, a company that always claimed that subscribers would receive new features without hardware updates, and that if new hardware was released, subscribers would receive it for free, has today released two new devices and is asking existing subscribers to pay for them. This is

Whoop, a company that always claimed that subscribers would receive new features without hardware updates, and that if new hardware was released, subscribers would receive it for free, has today released two new devices and is asking existing subscribers to pay for them.

This is
Virtue AI (@virtueai_co) 's Twitter Profile Photo

🚀 Introducing VirtueAgent, the first security layer for the agentic era. As AI agents begin to act autonomously in real-world environments, such as personal assistants, finance, healthcare, ensuring they operate securely and compliant is critical. VirtueAgent provides

David Hall (@dlwh) 's Twitter Profile Photo

Super excited Marin is finally out! Come see what we've been building! Code/platform for training fully reproducible models end-to-end, from data to evals. Plus a new high quality 8B base model, fully documented from start to finish.

Brando Miranda (@brandohablando) 's Twitter Profile Photo

One of our newest pre-training projects was built with Marin! Stay tuned for more soon! Thanks for Elyas Obbad & David Hall for being so fun to work with -- and Percy Liang help test Marin & Sanmi Koyejo really good kind advice. & Rylan Schaeffer for his very efficient feedback ;)

Elyas Obbad (@obbadelyas) 's Twitter Profile Photo

Marin is amazing! It allows for friendly and reproducible AI research and has enabled our team to rapidly test out different ideas. Great community as well (: cc Brando Miranda David Hall Rylan Schaeffer Percy Liang Sanmi Koyejo

Katie Everett (@_katieeverett) 's Twitter Profile Photo

1. We often observe power laws between loss and compute: loss = a * flops ^ b + c 2. Models are rapidly becoming more efficient, i.e. use less compute to reach the same loss But: which innovations actually change the exponent in the power law (b) vs change only the constant (a)?

Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 🎉🎉🎉 TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to

A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 🎉🎉🎉

TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to
Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

🚨New preprint 🚨 Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims 1/8

🚨New preprint 🚨

Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models

We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims

1/8
David Hall (@dlwh) 's Twitter Profile Photo

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )
Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

Another #ICML2025 paper! Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? TLDR: Predicting language model performance with scale on multiple choice question-answer (MCQA) benchmarks is made difficult b/c ... 1/3

Another #ICML2025 paper!

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

TLDR: Predicting language model performance with scale on multiple choice question-answer (MCQA) benchmarks is made difficult b/c ...

1/3
Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models? Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄 Joshua Kazdan Apratim Dey Matthias Gerstgrasser Rafael Rafailov @ NeurIPS Sanmi Koyejo 1/7

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models?

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄

<a href="/JoshuaK92829/">Joshua Kazdan</a> <a href="/ApratimDey2/">Apratim Dey</a> <a href="/MGerstgrasser/">Matthias Gerstgrasser</a> <a href="/rm_rafailov/">Rafael Rafailov @ NeurIPS</a> <a href="/sanmikoyejo/">Sanmi Koyejo</a> 

1/7
Brando Miranda (@brandohablando) 's Twitter Profile Photo

We use Trace in our new VeriBench benchmark for code verification in Lean 4 -- stay tuned for more details soon! Openreview: openreview.net/forum?id=rWkGF… should be public soon, ICML MATH-AI workshop, come Friday to chat with me about this and AI for correct and verified code! And AI

Brando Miranda (@brandohablando) 's Twitter Profile Photo

🚨 Can your LLM really do math—or is it cramming the test set? 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 1. openreview.net/forum?id=kqj2C… 2. icml.cc/virtual/2025/p… #ICML2025 East Exhibition Hall A-B, #E-2502 🧵1/14

🚨 Can your LLM really do math—or is it cramming the test set?
 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 

1. openreview.net/forum?id=kqj2C…
2. icml.cc/virtual/2025/p…

#ICML2025 East Exhibition Hall A-B, #E-2502

🧵1/14
Elyas Obbad (@obbadelyas) 's Twitter Profile Photo

Check out our work on verifiable code generation! This is an extremely promising research direction and I’m excited to see it gain traction

Brando Miranda (@brandohablando) 's Twitter Profile Photo

Come to Convention Center West room 208-209 2nd floor to learn about optimal data selection using compression like gzip! tldr; you can learn much faster if you use gzip compression distances to select data given a task! DM if you are interested or what to use the code!