Soham De (@sohamde_) Twitter Tweets • TwiCopy

Vaibhav (VB) Srivastav

a year ago

Welcome RecurrentGemma 9B 🔥 > Same performance as Gemma with more than 25% lower latency and 6-7x higher tokens/ sec ⚡ > Base (9B) and Instruct (9B-IT) models released. > MMLU - 60.5, CommonSenseQA 73.2, AGIEval 39.3 - pretty strong base model to fine-tune further. > Based on

thumb_up_off_alt217

chat_bubble_outline9

repeat48

shareShare

Soham De

@sohamde_

a year ago

🔥 Introducing our 9B language model, trained on 2 trillion tokens! 🚀 Based on Griffin (arxiv.org/abs/2402.19427) and delivers: 💪 Powerful performance ⚡️ Lightning-fast inference Pretrained and instruction-tuned models now available on HF & Kaggle! Start building today! 🏗️

thumb_up_off_alt63

chat_bubble_outline0

repeat12

shareShare

Preetum Nakkiran

@preetumnakkiran

a year ago

Our tutorial on diffusion & flows is out! We made every effort to simplify the math, while still being correct. Hope you enjoy! (Link below -- it's long but is split into 5 mostly-self-contained chapters). lots of fun working with Arwen Bradley Hattie Zhou Madhu Advani on this

thumb_up_off_alt1,1K

chat_bubble_outline26

repeat261

shareShare

Surya Bhupatiraju

@suryabhupa

a year ago

I am absolutely thrilled to announce the release of Gemma 2! Today, we're releasing both pre-trained-only and fully post-trained 9B and 27B models. The full technical report is here: goo.gle/gemma2report and it's live *right now* on aistudio.google.com.

thumb_up_off_alt234

chat_bubble_outline21

repeat49

shareShare

Caglar Gulcehre

@caglarml

a year ago

Soham De is presenting on SSM architectures and RNNs in NGSM workshop at Strauss 3 #ICML2024.

<a href="/sohamde_/">Soham De</a> is presenting on SSM architectures and RNNs in NGSM workshop at Strauss 3 #ICML2024.

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Soham De

@sohamde_

a year ago

It was fun to moderate this discussion with a great group of panelists. Lots of interesting points made on how to approach the next gen of seq modelling architectures. Thanks for the invite Caglar Gulcehre Antonio Orvieto Razvan and others!

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Armand Joulin

@armandjoulin

a year ago

Are small models still undertrained? We are releasing a 2B model that beats GPT-3.5. The crazy part is that it was distill on only 2T tokens from a small model. Distillation is the future of LLMs with the growing availability of large and efficient open models!

thumb_up_off_alt372

chat_bubble_outline10

repeat39

shareShare

Gus (🤖🧠+🐍+🥑🗣️)

@gusthema

a year ago

A new blog post talking about Gemma architecture explained! This time is RecurrentGemma: developers.googleblog.com/en/gemma-expla… This is the Gemma model that is not based in the Transformers architecture but on Recurrent Neural Network! Is this the return of RNNs? #gemmaverse

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

Google DeepMind

@googledeepmind

a year ago

We’re presenting AlphaProteo: an AI system for designing novel proteins that bind more successfully to target molecules. 🧬 It could help scientists better understand how biological systems function, save time in research, advance drug design and more. 🧵 dpmd.ai/3XuMqbX

thumb_up_off_alt2,2K

chat_bubble_outline73

repeat823

shareShare

Preetum Nakkiran

@preetumnakkiran

a year ago

We have an opening for a PhD intern working closely with (among others) me, Arwen Bradley, David Berthelot, on scientific aspects of diffusion & generative models. 1/

thumb_up_off_alt208

chat_bubble_outline4

repeat37

shareShare

Caglar Gulcehre

@caglarml

a year ago

Great contribution from Meta to the research community with a very easy-to-read codebase for LLM development: github.com/facebookresear… Soham De and Samuel L Smith have implemented Hawk as well, which seems to have a performance comparable to Mamba.

thumb_up_off_alt135

chat_bubble_outline2

repeat19

shareShare

Google DeepMind

@googledeepmind

a year ago

Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit. Available freely to developers and businesses, it will help them identify their AI-generated content. 🔍 Find out more → goo.gle/40apGQh

thumb_up_off_alt957

chat_bubble_outline27

repeat219

shareShare

Lisa Schut

@miouantoinette

5 months ago

Excited to share that our paper "Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero" is now out in PNAS! With Nenad Tomasev, Tom McGrath, Demis Hassabis, Ulrich Paquet, Been Kim 🎉 📄 doi.org/10.1073/pnas.2…

thumb_up_off_alt438

chat_bubble_outline16

repeat75

shareShare

Soham De

@sohamde_

5 months ago

Our new paper sheds light on the process of knowledge acquisition in language models, with implications for - data curricula - the challenges of learning new knowledge when fine-tuning - the emergence of hallucinations. Nicolas did a great job on the project! See his thread👇

thumb_up_off_alt36

chat_bubble_outline1

repeat6

shareShare

Brendan O'Donoghue

@bodonoghue85

4 months ago

Excited to share what my team has been working on lately - Gemini diffusion! We bring diffusion to language modeling, yielding more power and blazing speeds! 🚀🚀🚀 Gemini diffusion is especially strong at coding. In this example the model generates at 2000 tokens/sec,

thumb_up_off_alt2,2K

chat_bubble_outline89

repeat258

shareShare

Vaishnavh Nagarajan

@_vaishnavh

3 months ago

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

thumb_up_off_alt137

chat_bubble_outline1

repeat35

shareShare

Antonio Orvieto

@orvieto_antonio

3 months ago

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and Francis Bach arxiv.org/pdf/2502.09287

thumb_up_off_alt100

chat_bubble_outline2

repeat26

shareShare

Jun Cheng

@s6juncheng

3 months ago

Excited to share #AlphaGenome, a start of our AlphaGenome named journey to decipher the regulatory genome! The model matches or exceeds top-performing external models on 24 out of 26 variant evaluations, across a wide range of biological modalities.1/6

thumb_up_off_alt904

chat_bubble_outline11

repeat211

shareShare