Center for Research on Foundation Models (@stanfordcrfm) 's Twitter Profile
Center for Research on Foundation Models

@stanfordcrfm

Making foundation models more reliable and accessible.

ID: 1523887392494555136

calendar_today10-05-2022 04:48:09

72 Tweet

2,2K Followers

3 Following

Avanika Narayan (@avanika15) 's Twitter Profile Photo

Can AI agents automate enterprise-level workflows? Excited to share ECLAIR, a first step towards end-to-end digital workflow automation in knowledge-intensive settings like hospitals! ✍️Join the waitlist to work with ECLAIR: bit.ly/eclair-signup 📄Paper:

Can AI agents automate enterprise-level workflows?

Excited to share ECLAIR, a first step towards end-to-end digital workflow automation in knowledge-intensive settings like hospitals!

✍️Join the waitlist to work with ECLAIR: bit.ly/eclair-signup
📄Paper:
Michael Wornow (@michaelwornow) 's Twitter Profile Photo

Can AI agents automate enterprise-level workflows? Excited to share ECLAIR, a 1st step towards end-to-end digital workflow automation in knowledge-intensive settings like hospitals! 📄Paper: arxiv.org/abs/2405.03710 👨‍💻Code: bit.ly/eclair-github See it in action below 👇

Can AI agents automate enterprise-level workflows?

Excited to share ECLAIR, a 1st step towards end-to-end digital workflow automation in knowledge-intensive settings like hospitals!

📄Paper: arxiv.org/abs/2405.03710
👨‍💻Code: bit.ly/eclair-github

See it in action below 👇
Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/7) Happy mother’s day! We think what the mothers of America really want is a Flash Attention implementation that’s just 100 lines of code and 30% faster, and we’re happy to provide. We're excited to introduce ThunderKittens (TK), a simple DSL embedded within CUDA that makes

(1/7) Happy mother’s day! We think what the mothers of America really want is a Flash Attention implementation that’s just 100 lines of code and 30% faster, and we’re happy to provide.

We're excited to introduce ThunderKittens (TK), a simple DSL embedded within CUDA that makes
Jeannette Bohg (@leto__jean) 's Twitter Profile Photo

We dramatically sped up Diffusion policies through consistency distillation. With the resulting single step policy, we can run fast inference on laptop GPUs and robot on-board compute. 👇

Megha Srivastava (@megha_byte) 's Twitter Profile Photo

#2 RL agents can reflect too! In arxiv.org/abs/2405.04118,  Cédric, @dorsasadigh  Jacob Andreas, and I find when 🤖s  periodically generate language rules describing their best behaviors, they better interact with humans, and are more interpretable + generalizable (self-talk)!

#2 RL agents can reflect too! 
In arxiv.org/abs/2405.04118,  <a href="/cedcolas/">Cédric</a>, @dorsasadigh  <a href="/jacobandreas/">Jacob Andreas</a>, and I find when 🤖s  periodically generate language rules describing their best behaviors, they better interact with humans, and are more interpretable + generalizable (self-talk)!
James Zou (@james_y_zou) 's Twitter Profile Photo

🔥#TextGrad is now multi-modal! TextGrad boosts GPT-4o's visual reasoning ability: 📊MathVista score 63.8➡️66.1 w/ TextGrad 🧬Reduces ScienceQA error rate by 20%. Best reported 0-shot score Tutorial: colab.research.google.com/github/zou-gro… Great work Pan Lu Mert Yuksekgonul + team! Works

Simran Arora (@simran_s_arora) 's Twitter Profile Photo

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!! There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient and

Excited to share Just read twice: going beyond causal language modeling to close quality gaps between efficient recurrent models and attention-based models!!

There’s so much recent progress on recurrent architectures, which are dramatically more memory efficient  and
Joey Hejna (@joeyhejna) 's Twitter Profile Photo

As imitation learning policies continue to scale, deciding how to weigh different robot datasets will become even more difficult. To address this problem we introduce ReMix, a method for automatically curating large RT-X scale imitation learning datasets. 🧵(1/5)

CLS (@chengleisi) 's Twitter Profile Photo

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?

After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.
Luca Soldaini ✈️ ICLR 25 (@soldni) 's Twitter Profile Photo

Selecting pretraining data points based on correlation with downstream tasks is an effective data mixing technique I love papers that are a simple, elegant idea executed rly well! lovely read from Tristan Thrush Christopher Potts Tatsunori Hashimoto 😊 arxiv.org/abs/2409.05816

Selecting pretraining data points based on correlation with downstream tasks is an effective data mixing technique

I love papers that are a simple, elegant idea executed rly well!

lovely read from <a href="/TristanThrush/">Tristan Thrush</a> <a href="/ChrisGPotts/">Christopher Potts</a> <a href="/tatsu_hashimoto/">Tatsunori Hashimoto</a> 😊

arxiv.org/abs/2409.05816
Zitong Yang (@zitongyang0) 's Twitter Profile Photo

Grab your favorite preprint of the week: how can you put its knowledge in your LM’s parameters? Continued pretraining (CPT) works well with >10B tokens, but the preprint is <10K. Synthetic CPT downscales CPT to such small, targeted domains. 📜: arxiv.org/abs/2409.07431 🧵👇

Grab your favorite preprint of the week: how can you put its knowledge in your LM’s parameters? Continued pretraining (CPT) works well with &gt;10B tokens, but the preprint is &lt;10K.

Synthetic CPT downscales CPT to such small, targeted domains.

📜: arxiv.org/abs/2409.07431

🧵👇
Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

What is the best way to spend your inference compute budget to create LLM systems greater than the sum of their parts? In our latest paper, we present Archon, an architecture search framework for inference-time techniques! Archon is enabled by inference-time architecture search

Tony Lee (@tonyh_lee) 's Twitter Profile Photo

📢 Announcing Holistic Evaluation of Vision-Language Models (VHELM), the HELM extension for VLMs, where we holistically evaluated 22 VLMs across 9 different aspects: 📝 Paper: arxiv.org/abs/2410.07112 🥇 Leaderboard/prompts/raw predictions:  crfm.stanford.edu/helm/vhelm/lat… See 🧵 below

Michael Zhang (@mzhangio) 's Twitter Profile Photo

Ever wanted to scale subquadratic models up to 7B+ LLMs? But didn't want to pretrain billions of parameters on trillions of tokens? Then just for you, we're happy to share LoLCATs 😺 We show how to convert existing Transformers like Llama 3 8B & Mistral 7B into state-of-the-art

Simran Arora (@simran_s_arora) 's Twitter Profile Photo

Want Llama 405B, but wish it scaled linearly in sequence length??? Enter LoLCATS: an efficient method for "turning Transformers to linear attention models", all on an academic budget!! We use LoLCATS to linearize the *full Llama 3.1 model family* for the first time – 20+ points

Want Llama 405B, but wish it scaled linearly in sequence length??? Enter LoLCATS: an efficient method for "turning Transformers to linear attention models", all on an academic budget!! 

We use LoLCATS to linearize the *full Llama 3.1 model family* for the first time – 20+ points
Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/7) In celebration of National Cat Day, we’re excited to release our first major batch of updates to ThunderKittens! ThunderKittens is now easier, better, faster, and cuter than ever before! In addition to massive speed boosts, we’re releasing a broad swath of kernels, new

(1/7) In celebration of National Cat Day, we’re excited to release our first major batch of updates to ThunderKittens! ThunderKittens is now easier, better, faster, and cuter than ever before! In addition to massive speed boosts, we’re releasing a broad swath of kernels, new
Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

Interested in Building O1-style LM systems that beat individual LMs? Checkout our latest tutorial on Archon, a modular framework for optimizing the combinations of multiple LMs and inference-time techniques! With Archon, we can beat LM systems that use individual

Interested in Building O1-style LM systems that beat individual LMs?

Checkout our latest tutorial on Archon, a modular framework for optimizing the combinations of multiple LMs and inference-time techniques!

With Archon, we can beat LM systems that use individual
Simran Arora (@simran_s_arora) 's Twitter Profile Photo

Wish writing AI kernels was like writing PyTorch??? Enter ThunderKittens 0.002: for simpler, faster, more adorable AI kernels! We use TK to provide 10-40% faster attention backwards, CuBLAS-speed GEMMs, 8x faster state space models, 14x faster linear attentions – averaging <200

Wish writing AI kernels was like writing PyTorch??? Enter ThunderKittens 0.002: for simpler, faster, more adorable AI kernels! We use TK to provide 10-40% faster attention backwards, CuBLAS-speed GEMMs, 8x faster state space models, 14x faster linear attentions – averaging &lt;200
Dan Biderman (@dan_biderman) 's Twitter Profile Photo

How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost

Avanika Narayan (@avanika15) 's Twitter Profile Photo

we shipp’d 👭 on-device lms and frontier cloud lms. and…they were a match☺️. 98% accuracy, just 17.5% the cloud API costs beyond excited to drop minions: where local lms meet cloud lms 😊 joint work w/Sabri Eyuboglu & Dan Biderman at @hazyresearch. ty Together AI,