Badr Youbi Idrissi (@byoubii) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference arxiv.org/abs/2404.19737

thumb_up_off_alt882

chat_bubble_outline16

repeat133

shareShare

AK

@_akhaliq

a year ago

Meta announces Better & Faster Large Language Models via Multi-token Prediction Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at

thumb_up_off_alt589

chat_bubble_outline7

repeat119

shareShare

Benjamin Lefaudeux

@bentheegg

a year ago

Multi token prediction that works, really nice paper which I think will be foundational export.arxiv.org/abs/2404.19737 (1/N)

thumb_up_off_alt43

chat_bubble_outline2

repeat7

shareShare

elvis

@omarsar0

a year ago

The most exciting LLM paper of the week was the one from Gloeckle et al. that aims to train better and faster LLM via multi-token prediction. It's an impressive research paper so I had lots of thoughts as usual, especially because it attempts to push LLMs forward. I enjoyed

thumb_up_off_alt566

chat_bubble_outline8

repeat135

shareShare

AI at Meta

@aiatmeta

a year ago

Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and

thumb_up_off_alt2,2K

chat_bubble_outline98

repeat514

shareShare

Wassim (Wes) Bouaziz

@_vassim

9 months ago

Want to know if a ML model was trained on your dataset? Introducing ✨Data Taggants✨! We use data poisoning to leave a harmless and stealthy signature on your dataset that radiates through trained models. Learn how to protect your dataset from unauthorized use... A 🧵

thumb_up_off_alt77

chat_bubble_outline6

repeat20

shareShare

AI at Meta

@aiatmeta

9 months ago

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from Joelle Pineau. This work is another important step

thumb_up_off_alt634

chat_bubble_outline22

repeat119

shareShare

AI at Meta

@aiatmeta

9 months ago

We recently released Meta Lingua, a lightweight and self-contained codebase designed to train language models at scale. Lingua is designed for research and uses easy-to-modify PyTorch components in order to try new architectures, losses, data and more.

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat190

shareShare

Mathurin Videau

@mathuvu_

9 months ago

Meta Lingua: a minimal, fast LLM codebase for training and inference. By researchers, for researchers. Easily hackable, still reproducible. Built-in efficiency, profiling (cpu, gpu and mem) and interpretability (automatic activation and gradient statistics) Joint work w/ Badr Youbi Idrissi

thumb_up_off_alt48

chat_bubble_outline1

repeat14

shareShare

Andrew Carr (e/🤸)

@andrew_n_carr

9 months ago

A great example of FlexAttention used in a reasonably modern code base is Lingua. Which is designed to reproduce Llama 2 7B overnight They have a great example of batched / sequence-stacked attention masking for within document attention. Which then is used in the mod function

thumb_up_off_alt110

chat_bubble_outline3

repeat11

shareShare

AK

@_akhaliq

8 months ago

Meta presents Watermark Anything with Localized Messages

thumb_up_off_alt275

chat_bubble_outline7

repeat58

shareShare

TimDarcet

@timdarcet

5 months ago

Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.

thumb_up_off_alt600

chat_bubble_outline21

repeat108

shareShare

Aran Komatsuzaki

@arankomatsuzaki

a month ago

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets Presents an autoregressive U-Net that processes raw bytes and learns hierarchical token representation Matches strong BPE baselines, with deeper hierarchies demonstrating promising scaling trends

thumb_up_off_alt365

chat_bubble_outline3

repeat56

shareShare

Mathurin Videau

@mathuvu_

a month ago

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning. Joint work with Badr Youbi Idrissi 1/8

thumb_up_off_alt189

chat_bubble_outline14

repeat47

shareShare

Nikola Jovanović @ ICLR 🇸🇬

@ni_jovanovic

a month ago

There's a lot of work now on LLM watermarking. But can we extend this to transformers trained for autoregressive image generation? Yes, but it's not straightforward 🧵(1/10)

thumb_up_off_alt316

chat_bubble_outline6

repeat53

shareShare