Center for Research on Foundation Models (@stanfordcrfm) Twitter Tweets • TwiCopy

Avanika Narayan

a year ago

Can AI agents automate enterprise-level workflows? Excited to share ECLAIR, a first step towards end-to-end digital workflow automation in knowledge-intensive settings like hospitals! ✍️Join the waitlist to work with ECLAIR: bit.ly/eclair-signup 📄Paper:

thumb_up_off_alt78

chat_bubble_outline6

repeat18

shareShare

Michael Wornow

@michaelwornow

a year ago

Can AI agents automate enterprise-level workflows? Excited to share ECLAIR, a 1st step towards end-to-end digital workflow automation in knowledge-intensive settings like hospitals! 📄Paper: arxiv.org/abs/2405.03710 👨‍💻Code: bit.ly/eclair-github See it in action below 👇

thumb_up_off_alt44

chat_bubble_outline3

repeat18

shareShare

Benjamin F Spector

@bfspector

a year ago

(1/7) Happy mother’s day! We think what the mothers of America really want is a Flash Attention implementation that’s just 100 lines of code and 30% faster, and we’re happy to provide. We're excited to introduce ThunderKittens (TK), a simple DSL embedded within CUDA that makes

thumb_up_off_alt895

chat_bubble_outline20

repeat156

shareShare

Jeannette Bohg

@leto__jean

a year ago

We dramatically sped up Diffusion policies through consistency distillation. With the resulting single step policy, we can run fast inference on laptop GPUs and robot on-board compute. 👇

thumb_up_off_alt105

chat_bubble_outline1

repeat15

shareShare

Megha Srivastava

@megha_byte

a year ago

#2 RL agents can reflect too! In arxiv.org/abs/2405.04118, Cédric, @dorsasadigh Jacob Andreas, and I find when 🤖s periodically generate language rules describing their best behaviors, they better interact with humans, and are more interpretable + generalizable (self-talk)!

#2 RL agents can reflect too!
In arxiv.org/abs/2405.04118, <a href="/cedcolas/">Cédric</a>, @dorsasadigh <a href="/jacobandreas/">Jacob Andreas</a>, and I find when 🤖s periodically generate language rules describing their best behaviors, they better interact with humans, and are more interpretable + generalizable (self-talk)!

thumb_up_off_alt92

chat_bubble_outline2

repeat28

shareShare

James Zou

@james_y_zou

a year ago

🔥#TextGrad is now multi-modal! TextGrad boosts GPT-4o's visual reasoning ability: 📊MathVista score 63.8➡️66.1 w/ TextGrad 🧬Reduces ScienceQA error rate by 20%. Best reported 0-shot score Tutorial: colab.research.google.com/github/zou-gro… Great work Pan Lu Mert Yuksekgonul + team! Works

thumb_up_off_alt170

chat_bubble_outline2

repeat31

shareShare

Simran Arora

@simran_s_arora

a year ago

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!! There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and

thumb_up_off_alt299

chat_bubble_outline7

repeat58

shareShare

Joey Hejna

@joeyhejna

a year ago

As imitation learning policies continue to scale, deciding how to weigh different robot datasets will become even more difficult. To address this problem we introduce ReMix, a method for automatically curating large RT-X scale imitation learning datasets. 🧵(1/5)

thumb_up_off_alt192

chat_bubble_outline3

repeat34

shareShare

CLS

@chengleisi

a year ago

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

thumb_up_off_alt3,3K

chat_bubble_outline94

repeat768

shareShare

Luca Soldaini ✈️ ICLR 25

@soldni

a year ago

Selecting pretraining data points based on correlation with downstream tasks is an effective data mixing technique I love papers that are a simple, elegant idea executed rly well! lovely read from Tristan Thrush Christopher Potts Tatsunori Hashimoto 😊 arxiv.org/abs/2409.05816

thumb_up_off_alt339

chat_bubble_outline3

repeat39

shareShare

Zitong Yang

@zitongyang0

a year ago

Grab your favorite preprint of the week: how can you put its knowledge in your LM’s parameters? Continued pretraining (CPT) works well with >10B tokens, but the preprint is <10K. Synthetic CPT downscales CPT to such small, targeted domains. 📜: arxiv.org/abs/2409.07431 🧵👇

thumb_up_off_alt148

chat_bubble_outline3

repeat37

shareShare

Jon Saad-Falcon

@jonsaadfalcon

a year ago

What is the best way to spend your inference compute budget to create LLM systems greater than the sum of their parts? In our latest paper, we present Archon, an architecture search framework for inference-time techniques! Archon is enabled by inference-time architecture search

thumb_up_off_alt185

chat_bubble_outline8

repeat54

shareShare

Tony Lee

@tonyh_lee

a year ago

📢 Announcing Holistic Evaluation of Vision-Language Models (VHELM), the HELM extension for VLMs, where we holistically evaluated 22 VLMs across 9 different aspects: 📝 Paper: arxiv.org/abs/2410.07112 🥇 Leaderboard/prompts/raw predictions: crfm.stanford.edu/helm/vhelm/lat… See 🧵 below

thumb_up_off_alt90

chat_bubble_outline4

repeat25

shareShare

Michael Zhang

@mzhangio

a year ago

Ever wanted to scale subquadratic models up to 7B+ LLMs? But didn't want to pretrain billions of parameters on trillions of tokens? Then just for you, we're happy to share LoLCATs 😺 We show how to convert existing Transformers like Llama 3 8B & Mistral 7B into state-of-the-art

thumb_up_off_alt244

chat_bubble_outline6

repeat51

shareShare

Simran Arora

@simran_s_arora

a year ago

Want Llama 405B, but wish it scaled linearly in sequence length??? Enter LoLCATS: an efficient method for "turning Transformers to linear attention models", all on an academic budget!! We use LoLCATS to linearize the *full Llama 3.1 model family* for the first time – 20+ points

thumb_up_off_alt656

chat_bubble_outline9

repeat89

shareShare

Benjamin F Spector

@bfspector

a year ago

(1/7) In celebration of National Cat Day, we’re excited to release our first major batch of updates to ThunderKittens! ThunderKittens is now easier, better, faster, and cuter than ever before! In addition to massive speed boosts, we’re releasing a broad swath of kernels, new

thumb_up_off_alt125

chat_bubble_outline8

repeat40

shareShare

Jon Saad-Falcon

@jonsaadfalcon

a year ago

Interested in Building O1-style LM systems that beat individual LMs? Checkout our latest tutorial on Archon, a modular framework for optimizing the combinations of multiple LMs and inference-time techniques! With Archon, we can beat LM systems that use individual

thumb_up_off_alt60

chat_bubble_outline1

repeat19

shareShare

Simran Arora

@simran_s_arora

a year ago

Wish writing AI kernels was like writing PyTorch??? Enter ThunderKittens 0.002: for simpler, faster, more adorable AI kernels! We use TK to provide 10-40% faster attention backwards, CuBLAS-speed GEMMs, 8x faster state space models, 14x faster linear attentions – averaging <200

thumb_up_off_alt265

chat_bubble_outline8

repeat37

shareShare

Dan Biderman

@dan_biderman

8 months ago

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost

thumb_up_off_alt600

chat_bubble_outline34

repeat165

shareShare

Avanika Narayan

@avanika15

8 months ago

we shipp’d 👭 on-device lms and frontier cloud lms. and…they were a match☺️. 98% accuracy, just 17.5% the cloud API costs beyond excited to drop minions: where local lms meet cloud lms 😊 joint work w/Sabri Eyuboglu & Dan Biderman at @hazyresearch. ty Together AI,

thumb_up_off_alt81

chat_bubble_outline6

repeat44

shareShare