Adam Fisch (@adamjfisch) 's Twitter Profile
Adam Fisch

@adamjfisch

Research Scientist @ Google DeepMind | Formerly: PhD @ MIT EECS.

ID: 892997634813710336

linkhttp://people.csail.mit.edu/fisch/ calendar_today03-08-2017 06:36:42

292 Tweet

1,1K Followers

245 Following

Adam Fisch (@adamjfisch) 's Twitter Profile Photo

Checkout our new paper on Recursive Transformers. Great having Sangmin here at Google DeepMind to lead it! Particularly excited about the potential for continuous depth wise batching for much better early-exiting batch throughout.

Anastasios Nikolas Angelopoulos (@ml_angelopoulos) 's Twitter Profile Photo

🚨 New Textbook on Conformal Prediction 🚨 arxiv.org/abs/2411.11824 “The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these

🚨 New Textbook on Conformal Prediction 🚨

arxiv.org/abs/2411.11824

“The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. 

Many of these
Stephen Bates (@stats_stephen) 's Twitter Profile Photo

Important topic, but this is more of a quick-start guide. For cutting-edge research on LLM evals, see these papers using Prediction-Powered Inference to incorporate synthetic data and model predictions for narrower CIs. 👇 Gemini already knows about them!

Jonathan Berant (@jonathanberant) 's Twitter Profile Photo

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs Jacob Eisenstein Reza Aghajani Adam Fisch dheeru dua Fantine Huot ✈️ ICLR 25 Mirella Lapata Vicky Zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

Hi ho!

New work: arxiv.org/pdf/2503.14481
With amazing collabs <a href="/jacobeisenstein/">Jacob Eisenstein</a> <a href="/jdjdhekchbdjd/">Reza Aghajani</a> <a href="/adamjfisch/">Adam Fisch</a> <a href="/ddua17/">dheeru dua</a> <a href="/fantinehuot/">Fantine Huot ✈️ ICLR 25</a> <a href="/mlapata/">Mirella Lapata</a> <a href="/vicky_zayats/">Vicky Zayats</a>

Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3
Deedy (@deedydas) 's Twitter Profile Photo

Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.

Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions.

It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read.

Has potential to be a Transformers killer.
Sangmin Bae (@raymin0223) 's Twitter Profile Photo

Thanks for sharing our work, Deedy MoR is a new arch that upgrades Recursive Transformers and Early-Exiting algorithms. Simple pretraining with router, and faster inference speed and lower KV caches! Post for details and codes will be released very soon. Stay tuned! ☺️

Reza Bayat (@reza_byt) 's Twitter Profile Photo

📄 New Paper Alert! ✨ 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput.

📄 New Paper Alert! ✨

🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput

Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput.
Yujin Kim (@yujin301300) 's Twitter Profile Photo

Introducing our new work: 🚀Mixture-of-Recursions! 🪄We propose a novel framework that dynamically allocates recursion depth per token. 🪄MoR is an efficient architecture with fewer params, reduced KV cache memory, and 2× greater throughput— maintaining comparable performance!

Introducing our new work: 🚀Mixture-of-Recursions!

🪄We propose a novel framework that dynamically allocates recursion depth per token.

🪄MoR is an efficient architecture with fewer params, reduced KV cache memory, and 2× greater throughput— maintaining comparable performance!
Sangmin Bae (@raymin0223) 's Twitter Profile Photo

✨Huge thanks for interest in Mixture-of-Recursions! Codes are officially out! It's been a long journey exploring Early-exiting with Recursive Architecture. I'll soon post my 👨‍🎓PhD thesis on Adaptive Computation too! Code: github.com/raymin0223/mix… Paper: arxiv.org/abs/2507.10524

✨Huge thanks for interest in Mixture-of-Recursions! Codes are officially out!

It's been a long journey exploring Early-exiting with Recursive Architecture.
I'll soon post my 👨‍🎓PhD thesis on Adaptive Computation too!

Code: github.com/raymin0223/mix…
Paper: arxiv.org/abs/2507.10524
Sangmin Bae (@raymin0223) 's Twitter Profile Photo

🏋️‍♂️This unified MoR framework has very good performance and faster speeds. Check it out and ask any questions! Huge thanks to my awesome co-authors: Yujin Kim Reza Bayat Sungnyun Kim Jen Ha @ ICML 2025 Tal Schuster Adam Fisch Hrayr Harutyunyan Ziwei Ji Aaron Courville Se-Young Yun! 🥰