Julius Adebayo (@juliusadml) 's Twitter Profile
Julius Adebayo

@juliusadml

Building @guidelabsai - Engineering interpretable models and agents that are easy to audit. PhD in ML @MITEECS, Ex @Meta, Google Brain, & Prescient Design.

ID: 335091279

linkhttp://juliusadebayo.com/ calendar_today14-07-2011 04:04:59

2,2K Tweet

2,2K Followers

977 Following

Richard Sutton (@richardssutton) 's Twitter Profile Photo

awards.acm.org/about/2024-tur… Machines that learn from experience were explored by Alan Turing almost eighty years ago, which makes it particularly gratifying and humbling to receive an award in his name for reviving this essential but still nascent idea.

Omar Khattab (@lateinteraction) 's Twitter Profile Photo

lol half of the replies to this are like "use dspy" or "use this company/service/library downstream of dspy" lol, but what Tom is describing is IMO not the right problem it's not that "prompts are hard", I trust you can write them, they're just the wrong canvas for AI systems

Kyunghyun Cho (@kchonyc) 's Twitter Profile Photo

on my way back to NYC, i met wise Leon Bottou in the airport. we talked. then i told him "you should tweet that!" and, he delivered much more than a tweet: a blog post with thoughts and insights on AI research only he can deliver this clearly and succinctly.

on my way back to NYC, i met wise Leon Bottou in the airport. we talked. then i told him "you should tweet that!"

and, he delivered much more than a tweet: a blog post with thoughts and insights on AI research only he can deliver this clearly and succinctly.
Percy Liang (@percyliang) 's Twitter Profile Photo

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

Julius Adebayo (@juliusadml) 's Twitter Profile Photo

Tough life if you naturally use em dashes—like this—everyone thinks you are an AI copy paster. I now need to force myself not to use them.

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Diffusion Beats Autoregressive in Data-Constrained Settings Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens. Key findings: 1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide

Diffusion Beats Autoregressive in Data-Constrained Settings

Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens.

Key findings:

1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide
Julius Adebayo (@juliusadml) 's Twitter Profile Photo

Real math = only Grothendieck. Real chess = Carlsen not Stockfish. Real Go = Lee Sedol (AlphaGo doesn't count). Real thinking = Einstein not plebs. Real driving = F1 drivers not Waymo. Lmao what a take.

Kaiyue Wen (@wen_kaiyue) 's Twitter Profile Photo

(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!

(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! &gt;4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups &lt;0.5B &amp; only 10% at 1.2B (8× Chinchilla)!
Julius Adebayo (@juliusadml) 's Twitter Profile Photo

Nice evaluation paper. When you ask an LLM to explain the activations of another LLM. Turns out the explainer-LLM just explains itself instead!