Soham De (@sohamde_) 's Twitter Profile
Soham De

@sohamde_

Research Scientist at DeepMind. Previously PhD at the University of Maryland.

ID: 306050621

linkhttps://sohamde.github.io/ calendar_today27-05-2011 05:59:44

187 Tweet

2,2K Followers

1,1K Following

Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

Welcome RecurrentGemma 9B πŸ”₯ > Same performance as Gemma with more than 25% lower latency and 6-7x higher tokens/ sec ⚑ > Base (9B) and Instruct (9B-IT) models released. > MMLU - 60.5, CommonSenseQA 73.2, AGIEval 39.3 - pretty strong base model to fine-tune further. > Based on

Welcome RecurrentGemma 9B πŸ”₯

> Same performance as Gemma with more than 25% lower latency and 6-7x higher tokens/ sec ⚑
> Base (9B) and Instruct (9B-IT) models released.
> MMLU - 60.5, CommonSenseQA 73.2, AGIEval 39.3 - pretty strong base model to fine-tune further.
> Based on
Soham De (@sohamde_) 's Twitter Profile Photo

πŸ”₯ Introducing our 9B language model, trained on 2 trillion tokens! πŸš€ Based on Griffin (arxiv.org/abs/2402.19427) and delivers: πŸ’ͺ Powerful performance ⚑️ Lightning-fast inference Pretrained and instruction-tuned models now available on HF & Kaggle! Start building today! πŸ—οΈ

Preetum Nakkiran (@preetumnakkiran) 's Twitter Profile Photo

Our tutorial on diffusion & flows is out! We made every effort to simplify the math, while still being correct. Hope you enjoy! (Link below -- it's long but is split into 5 mostly-self-contained chapters). lots of fun working with Arwen Bradley Hattie Zhou Madhu Advani on this

Our tutorial on diffusion & flows is out! We made every effort to simplify the math, while still being correct. Hope you enjoy! (Link below -- it's long but is split into 5 mostly-self-contained chapters).

lots of fun working with <a href="/ArwenBradley/">Arwen Bradley</a> <a href="/oh_that_hat/">Hattie Zhou</a> <a href="/advani_madhu/">Madhu Advani</a> on this
Surya Bhupatiraju (@suryabhupa) 's Twitter Profile Photo

I am absolutely thrilled to announce the release of Gemma 2! Today, we're releasing both pre-trained-only and fully post-trained 9B and 27B models. The full technical report is here: goo.gle/gemma2report and it's live *right now* on aistudio.google.com.

Soham De (@sohamde_) 's Twitter Profile Photo

It was fun to moderate this discussion with a great group of panelists. Lots of interesting points made on how to approach the next gen of seq modelling architectures. Thanks for the invite Caglar Gulcehre Antonio Orvieto Razvan and others!

Armand Joulin (@armandjoulin) 's Twitter Profile Photo

Are small models still undertrained? We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model. Distillation is the future of LLMs with the growing availability of large and efficient open models!

Are small models still undertrained? 
We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model. 
Distillation is the future of LLMs with the growing availability of large and efficient open models!
Gus (πŸ€–πŸ§ +🐍+πŸ₯‘πŸ—£οΈ) (@gusthema) 's Twitter Profile Photo

A new blog post talking about Gemma architecture explained! This time is RecurrentGemma: developers.googleblog.com/en/gemma-expla… This is the Gemma model that is not based in the Transformers architecture but on Recurrent Neural Network! Is this the return of RNNs? #gemmaverse

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

We’re presenting AlphaProteo: an AI system for designing novel proteins that bind more successfully to target molecules. 🧬 It could help scientists better understand how biological systems function, save time in research, advance drug design and more. 🧡 dpmd.ai/3XuMqbX

Preetum Nakkiran (@preetumnakkiran) 's Twitter Profile Photo

We have an opening for a PhD intern working closely with (among others) me, Arwen Bradley, David Berthelot, on scientific aspects of diffusion & generative models. 1/

Caglar Gulcehre (@caglarml) 's Twitter Profile Photo

Great contribution from Meta to the research community with a very easy-to-read codebase for LLM development: github.com/facebookresear… Soham De and Samuel L Smith have implemented Hawk as well, which seems to have a performance comparable to Mamba.

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit. Available freely to developers and businesses, it will help them identify their AI-generated content. πŸ” Find out more β†’ goo.gle/40apGQh

Lisa Schut (@miouantoinette) 's Twitter Profile Photo

Excited to share that our paper "Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero" is now out in PNAS! With Nenad Tomasev, Tom McGrath, Demis Hassabis, Ulrich Paquet, Been Kim πŸŽ‰ πŸ“„ doi.org/10.1073/pnas.2…

Soham De (@sohamde_) 's Twitter Profile Photo

Our new paper sheds light on the process of knowledge acquisition in language models, with implications for - data curricula - the challenges of learning new knowledge when fine-tuning - the emergence of hallucinations. Nicolas did a great job on the project! See his threadπŸ‘‡

Brendan O'Donoghue (@bodonoghue85) 's Twitter Profile Photo

Excited to share what my team has been working on lately - Gemini diffusion! We bring diffusion to language modeling, yielding more power and blazing speeds! πŸš€πŸš€πŸš€ Gemini diffusion is especially strong at coding. In this example the model generates at 2000 tokens/sec,

Vaishnavh Nagarajan (@_vaishnavh) 's Twitter Profile Photo

πŸ“’ New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: β†’ LLMs are limited in creativity since they learn to predict the next token β†’ creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧡

πŸ“’ New paper on creativity &amp; multi-token prediction! We design minimal open-ended tasks to argue:

β†’ LLMs are limited in creativity since they learn to predict the next token

β†’ creativity can be improved via multi-token learning &amp; injecting noise ("seed-conditioning" 🌱) 1/ 🧡
Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and Francis Bach arxiv.org/pdf/2502.09287

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. 

It's surprising how much one can delve into, and how beautiful it can become.

With (and only thanks to) the amazing Alexandre and <a href="/BachFrancis/">Francis Bach</a> 

arxiv.org/pdf/2502.09287
Jun Cheng (@s6juncheng) 's Twitter Profile Photo

Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6

Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6