Adrian Lancucki (@adrianlancucki) Twitter Tweets • TwiCopy

Adrian Lancucki

@adrianlancucki

+ Follow

Deep learning researcher/engineer @NVIDIA.

ID: 1543606289842032641

calendar_today03-07-2022 14:43:28

4 Tweet

71 Followers

69 Following

Edoardo Ponti

@pontiedoardo

3 years ago

Can we increase the efficiency *and* performance of auto-regressive models? We introduce dynamic-pooling Transformers, which jointly perform language modelling and token segmentation. Piotr Nawrot* Adrian Lancucki Jan Chorowski 📜arxiv.org/abs/2211.09761 🧑‍💻github.com/PiotrNawrot/dy…

thumb_up_off_alt92

chat_bubble_outline2

repeat29

shareShare

Piotr Nawrot

@p_nawrot

3 years ago

Introducing *nanoT5* Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget (1xA100 GPU, ~20 hours) in PyTorch 🧑‍💻github.com/PiotrNawrot/na… EdinburghNLP

Introducing *nanoT5*

Inspired by <a href="/jonasgeiping/">Jonas Geiping</a>'s Cramming and <a href="/karpathy/">Andrej Karpathy</a>'s nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget (1xA100 GPU, ~20 hours) in PyTorch

🧑‍💻github.com/PiotrNawrot/na…

<a href="/EdinburghNLP/">EdinburghNLP</a>

thumb_up_off_alt442

chat_bubble_outline8

repeat82

shareShare

Piotr Nawrot

@p_nawrot

2 years ago

The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at the expense of performance. We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance

thumb_up_off_alt461

chat_bubble_outline10

repeat73

shareShare

Piotr Nawrot

@p_nawrot

a year ago

Tomorrow at ICML Conference, together with Edoardo Ponti and Adrian Lancucki, we'll present an updated version of "Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference". You can find an updated paper at arxiv.org/abs/2403.09636. Among others - 1) We trained DMC to

Tomorrow at <a href="/icmlconf/">ICML Conference</a>, together with <a href="/PontiEdoardo/">Edoardo Ponti</a> and <a href="/AdrianLancucki/">Adrian Lancucki</a>, we'll present an updated version of "Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference".

You can find an updated paper at arxiv.org/abs/2403.09636. Among others - 1) We trained DMC to

thumb_up_off_alt84

chat_bubble_outline7

repeat27

shareShare

Edoardo Ponti

@pontiedoardo

5 months ago

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. This unlocks *inference-time hyper-scaling* For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!

thumb_up_off_alt121

chat_bubble_outline5

repeat28

shareShare

Tomasz Limisiewicz

@tomlimi

2 months ago

TokShop videos are finally out! 🎥🤩 Check out the great talks from Yuval Pinter (Join them? beat them? Fix them?) Desmond Elliott (Pixel LM) Adrian Lancucki (dynamic segmentation) . panel with hot takes from 🔥: Alisa Liu @ COLM 🦙 Albert Gu Yuval Pinter Sander Land Kris Cao

thumb_up_off_alt12

chat_bubble_outline1

repeat2

shareShare