Bradley Brown (@brad19brown) 's Twitter Profile
Bradley Brown

@brad19brown

Bit rearranger 👨‍💻 | Incoming CS PhD at Stanford, CS Master's Student at the University of Oxford

ID: 3064176064

linkhttp://www.bradbrown.ca calendar_today26-02-2015 23:38:15

57 Tweet

299 Followers

367 Following

Dan Biderman (@dan_biderman) 's Twitter Profile Photo

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost

Avanika Narayan (@avanika15) 's Twitter Profile Photo

we shipp’d 👭 on-device lms and frontier cloud lms. and…they were a match☺️. 98% accuracy, just 17.5% the cloud API costs beyond excited to drop minions: where local lms meet cloud lms 😊 joint work w/Sabri Eyuboglu & Dan Biderman at @hazyresearch. ty Together AI,

Sabri Eyuboglu (@eyuboglusabri) 's Twitter Profile Photo

All these on-device models are coming out (e.g. llama 3.2). But how can we actually make them useful for hard reasoning workloads (beyond iMessage summarization)? Our idea: give the on-device models your long context and let them communicate with frontier models in the cloud.

Simon Guo 🦝 (@simonguozirui) 's Twitter Profile Photo

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench!

Turns out KernelBench is quite challenging 🧠 —  frontier models outperform the PyTorch Eager baseline &lt;20% of the time.

More 🧵👇
Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC. Blog: bit.ly/4kubAAK With Aaryan Singhal, Dan Fu, and @hazyresearch!

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC.

Blog: bit.ly/4kubAAK
With <a href="/AaryanSinghal4/">Aaryan Singhal</a>, <a href="/realDanFu/">Dan Fu</a>, and @hazyresearch!
Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/6) Joyously announcing ThunderKittens with real support on NVIDIA Blackwell! We've released BF16/FP8 GEMM and attention fwd+bwd kernels, up to 2x faster than cuBLAS GEMMs on H100. Blog: bit.ly/41tuT4Q With Dan Fu, Aaryan Singhal, and @hazyresearch!

hazyresearch (@hazyresearch) 's Twitter Profile Photo

The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking

The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking
Jordan Juravsky (@jordanjuravsky) 's Twitter Profile Photo

When studying repeated sampling in Large Language Monkeys, we found that the relationship between log(pass@k) and the number of samples often follows a power law. But *why* do we see this scaling law? At first glance, this is surprising, since for a single problem pass@k and k

Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved! The following amazing work theoretically proves the necessary and

In Large Language Monkeys, we showed the scaling laws of inference-time compute with repeated sampling--the power law relationship between the number of repeated attempts and the fraction of problems solved!

The following amazing work theoretically proves the necessary and
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use! With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use!

With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive
Anna Goldie (@annadgoldie) 's Twitter Profile Photo

Excited to share our new paper on Step-Wise Reinforcement Learning (SWiRL), which uses reinforcement learning and synthetic trajectories to improve multi-step reasoning and tool use! (1/8)

Excited to share our new paper on Step-Wise Reinforcement Learning (SWiRL), which uses reinforcement learning and synthetic trajectories to improve multi-step reasoning and tool use! (1/8)
Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces.

So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel.

Megakernels are faster &amp; more humane. Here’s how to treat your Llamas ethically:

(Joint
Jordan Juravsky (@jordanjuravsky) 's Twitter Profile Photo

We wrote a megakernel! Excited to share how we fused Llama-1B into a single kernel to reach SOTA latency. Check out our blog post and code below!

Sabri Eyuboglu (@eyuboglusabri) 's Twitter Profile Photo

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size.

What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x
Ryan Ehrlich (@ryansehrlich) 's Twitter Profile Photo

Giving LLMs very large amounts of context can be really useful, but it can also be slow and expensive. Could scaling inference time compute help? In our latest work, we show that allowing models to spend test time compute to “self-study” a large corpora can >20x decode

Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 
🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Jerry Liu (@jerrywliu) 's Twitter Profile Photo

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

Jacky Kwok (@jackyk02) 's Twitter Profile Photo

✨ Test-Time Scaling for Robotics ✨ Excited to release 🤖 RoboMonkey, which characterizes test-time scaling laws for Vision-Language-Action (VLA) models and introduces a framework that significantly improves the generalization and robustness of VLAs! 🧵(1 / N) 🌐 Website:

✨ Test-Time Scaling for Robotics ✨

Excited to release 🤖 RoboMonkey, which characterizes test-time scaling laws for Vision-Language-Action (VLA) models and introduces a framework that significantly improves the generalization and robustness of VLAs!

🧵(1 / N)

🌐 Website:
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

Looking forward to attending ICML!

Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!