Badr Youbi Idrissi (@byoubii) 's Twitter Profile
Badr Youbi Idrissi

@byoubii

ID: 3422816283

calendar_today14-08-2015 20:47:49

34 Tweet

290 Followers

130 Following

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference arxiv.org/abs/2404.19737

Meta presents Better & Faster Large Language Models via Multi-token Prediction

- training language models to predict multiple future tokens at once results in higher sample efficiency
- up to 3x faster at inference

arxiv.org/abs/2404.19737
AK (@_akhaliq) 's Twitter Profile Photo

Meta announces Better & Faster Large Language Models via Multi-token Prediction Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at

Meta announces Better & Faster Large Language Models via Multi-token Prediction

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at
Benjamin Lefaudeux (@bentheegg) 's Twitter Profile Photo

Multi token prediction that works, really nice paper which I think will be foundational export.arxiv.org/abs/2404.19737 (1/N)

elvis (@omarsar0) 's Twitter Profile Photo

The most exciting LLM paper of the week was the one from Gloeckle et al. that aims to train better and faster LLM via multi-token prediction. It's an impressive research paper so I had lots of thoughts as usual, especially because it attempts to push LLMs forward. I enjoyed

The most exciting LLM paper of the week was the one from Gloeckle et al. that aims to train better and faster LLM via multi-token prediction.

It's an impressive research paper so I had lots of thoughts as usual, especially because it attempts to push LLMs forward. 

I enjoyed
AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and

Wassim (Wes) Bouaziz (@_vassim) 's Twitter Profile Photo

Want to know if a ML model was trained on your dataset? Introducing ✨Data Taggants✨! We use data poisoning to leave a harmless and stealthy signature on your dataset that radiates through trained models. Learn how to protect your dataset from unauthorized use... A 🧵

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step

AI at Meta (@aiatmeta) 's Twitter Profile Photo

We recently released Meta Lingua, a lightweight and self-contained codebase designed to train language models at scale. Lingua is designed for research and uses easy-to-modify PyTorch components in order to try new architectures, losses, data and more.

We recently released Meta Lingua, a lightweight and self-contained codebase designed to train language models at scale. Lingua is designed for research and uses easy-to-modify <a href="/PyTorch/">PyTorch</a> components in order to try new architectures, losses, data and more.
Mathurin Videau (@mathuvu_) 's Twitter Profile Photo

Meta Lingua: a minimal, fast LLM codebase for training and inference. By researchers, for researchers. Easily hackable, still reproducible. Built-in efficiency, profiling (cpu, gpu and mem) and interpretability (automatic activation and gradient statistics) Joint work w/ Badr Youbi Idrissi

Andrew Carr (e/🤸) (@andrew_n_carr) 's Twitter Profile Photo

A great example of FlexAttention used in a reasonably modern code base is Lingua. Which is designed to reproduce Llama 2 7B overnight They have a great example of batched / sequence-stacked attention masking for within document attention. Which then is used in the mod function

A great example of FlexAttention used in a reasonably modern code base is Lingua. Which is designed to reproduce Llama 2 7B overnight

They have a great example of batched / sequence-stacked attention masking for within document attention. 

Which then is used in the mod function
TimDarcet (@timdarcet) 's Twitter Profile Photo

Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.

Want strong SSL, but not the complexity of DINOv2?

CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets Presents an autoregressive U-Net that processes raw bytes and learns hierarchical token representation Matches strong BPE baselines, with deeper hierarchies demonstrating promising scaling trends

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Presents an autoregressive U-Net that processes raw bytes and learns hierarchical token representation

Matches strong BPE baselines, with deeper hierarchies demonstrating promising scaling trends
Mathurin Videau (@mathuvu_) 's Twitter Profile Photo

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning. Joint work with Badr Youbi Idrissi 1/8

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning.
Joint work with <a href="/byoubii/">Badr Youbi Idrissi</a> 1/8
Nikola Jovanović @ ICLR 🇸🇬 (@ni_jovanovic) 's Twitter Profile Photo

There's a lot of work now on LLM watermarking. But can we extend this to transformers trained for autoregressive image generation? Yes, but it's not straightforward 🧵(1/10)