Adrian Lancucki (@adrianlancucki) 's Twitter Profile
Adrian Lancucki

@adrianlancucki

Deep learning researcher/engineer @NVIDIA.

ID: 1543606289842032641

calendar_today03-07-2022 14:43:28

4 Tweet

71 Followers

69 Following

Edoardo Ponti (@pontiedoardo) 's Twitter Profile Photo

Can we increase the efficiency *and* performance of auto-regressive models? We introduce dynamic-pooling Transformers, which jointly perform language modelling and token segmentation. Piotr Nawrot* Adrian Lancucki Jan Chorowski 📜arxiv.org/abs/2211.09761 🧑‍💻github.com/PiotrNawrot/dy…

Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

Introducing *nanoT5* Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget (1xA100 GPU, ~20 hours) in PyTorch 🧑‍💻github.com/PiotrNawrot/na… EdinburghNLP

Introducing *nanoT5*

Inspired by <a href="/jonasgeiping/">Jonas Geiping</a>'s Cramming and <a href="/karpathy/">Andrej Karpathy</a>'s nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget (1xA100 GPU, ~20 hours) in PyTorch

🧑‍💻github.com/PiotrNawrot/na…

<a href="/EdinburghNLP/">EdinburghNLP</a>
Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at the expense of performance. We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance

The memory in Transformers grows linearly with the sequence length at inference time.

In SSMs it is constant, but often at the expense of performance.

We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance
Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

Tomorrow at ICML Conference, together with Edoardo Ponti and Adrian Lancucki, we'll present an updated version of "Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference". You can find an updated paper at arxiv.org/abs/2403.09636. Among others - 1) We trained DMC to

Tomorrow at <a href="/icmlconf/">ICML Conference</a>, together with <a href="/PontiEdoardo/">Edoardo Ponti</a> and <a href="/AdrianLancucki/">Adrian Lancucki</a>, we'll present an updated version of "Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference".

You can find an updated paper at arxiv.org/abs/2403.09636. Among others - 1) We trained DMC to
Edoardo Ponti (@pontiedoardo) 's Twitter Profile Photo

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. This unlocks *inference-time hyper-scaling* For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. 

This unlocks *inference-time hyper-scaling*

For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!