Kelly Marchisio (St. Denis) (@cheeesio) Twitter Tweets • TwiCopy

Kelly Marchisio (St. Denis)

@cheeesio

+ Follow

Multilingualilty Lead @cohere. Formerly: PhD @jhuclsp, Alexa Fellow @amazon, dev @Google, MPhil @cambridgenlp, EdM @hgse 🔑🔑¬🧀 (@kelvenmar20)

ID: 1134629909551308800

linkhttp://kellymarchisio.github.io calendar_today01-06-2019 01:17:05

658 Tweet

2,2K Followers

630 Following

AK

@_akhaliq

6 months ago

The Sparse Frontier Sparse Attention Trade-offs in Transformer LLMs

thumb_up_off_alt236

chat_bubble_outline10

repeat33

shareShare

The Sparse Frontier Efficient sparse attention methods are key to scale LLMs to long contexts. We conduct the largest-scale empirical analysis that answers: 1. 🤏🔍 Are small dense models or large sparse models better? 2. ♾️What is the maximum permissible sparsity per task? 3.

thumb_up_off_alt188

chat_bubble_outline11

repeat30

shareShare

Kelly Marchisio (St. Denis)

@cheeesio

6 months ago

This was fun! Excellent work led by Piotr Nawrot during his internship at cohere

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Kelly Marchisio (St. Denis)

@cheeesio

6 months ago

Result of Nathaniel R. Robinson’s internship on the Cohere multilingual team last year! Check it out!

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Kelly Marchisio (St. Denis)

@cheeesio

6 months ago

I’m excited to see what you’ve built! 🚀

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Cohere Labs

@cohere_labs

5 months ago

Our ML Efficiency group is looking forward to welcoming Piotr Nawrot next week on May 28th, for a session on "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs" Learn more: cohere.com/events/Cohere-…

Our ML Efficiency group is looking forward to welcoming <a href="/p_nawrot/">Piotr Nawrot</a> next week on May 28th, for a session on "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs"

Learn more: cohere.com/events/Cohere-…

thumb_up_off_alt12

chat_bubble_outline1

repeat5

shareShare

Kelly Marchisio (St. Denis)

@cheeesio

5 months ago

Welcome, Ruochen! ✨

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Piotr Nawrot

@p_nawrot

5 months ago

Tomorrow at 6pm CET I'm giving a talk about our latest work on Sparse Attention, at Cohere Labs. I plan to describe the field as it is now, discuss our evaluation results, and share insights about what I believe is the future of Sparse Attention. See you!

thumb_up_off_alt33

chat_bubble_outline0

repeat3

shareShare

Cohere Labs

@cohere_labs

5 months ago

Over 7000 languages are spoken worldwide 🌐, but AI safety efforts focus on only a fraction of them. Our latest paper draws on our multi-year efforts with the wider research community to explore why this matters and how we can bridge the AI language gap.

$Over 7000 languages are spoken worldwide 🌐, but AI safety efforts focus on only a fraction of them. Our latest paper draws on our multi-year efforts with the wider research community to explore why this matters and how we can bridge the AI language gap.$

thumb_up_off_alt82

chat_bubble_outline2

repeat22

shareShare

Cohere Labs

@cohere_labs

5 months ago

Here are key recommendations to make AI safer & more equitable for everyone: 🌐 Incentivize the creation of open-access multilingual datasets 🪟 Encourage transparency in model language coverage 🔬 Prioritise resources towards multilingual research

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Edoardo Ponti

@pontiedoardo

5 months ago

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. This unlocks *inference-time hyper-scaling* For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!

thumb_up_off_alt121

chat_bubble_outline5

repeat28

shareShare

Piotr Nawrot

@p_nawrot

5 months ago

We release a major improvement upon last year's Dynamic Memory Compression. DMS is better, easier, and faster to train. Future of Long Context is 1) KV Cache Compression + 2) Sparse Attention, both training-aware to avoid training-inference mismatch. Imho, DMS is SOTA for 1).

thumb_up_off_alt12

chat_bubble_outline0

repeat4

shareShare

Kelly Marchisio (St. Denis)

@cheeesio

4 months ago

Code-release from our superstar intern, Piotr Nawrot! • Write sparse attn patterns in 50 lines, not 5k • Compatibility w models supported by vLLM, support for TP • 6 SOTA baselines with optimized implementations + 9 eval tasks • Research-grade extensibility = rapid prototyping

thumb_up_off_alt17

chat_bubble_outline1

repeat2

shareShare

Kelly Marchisio (St. Denis)

@cheeesio

4 months ago

The Multilingual Team at cohere is hiring! If this sounds like you, please apply: - strong coding skills and a keen eye for detail - experience working with the challenges & joys of multilingual data Help us bring AI to the world! 🌏🌍🌎 jobs.ashbyhq.com/cohere/a87be94…

thumb_up_off_alt178

chat_bubble_outline2

repeat29

shareShare

Matthias Gallé

@mgalle

4 months ago

Make Command speak better & in more languages

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

Wei-Yin Ko

@weiyinko_ml

4 months ago

We're looking for a new member for the multilingual team with a focus on data engineering! Please apply at the link below:

thumb_up_off_alt27

chat_bubble_outline1

repeat7

shareShare

David Ifeoluwa Adelani 🇳🇬

@davlanade

4 months ago

Excited to announce the call for papers for the Multilingual Representation Learning workshop #EMNLP2025 sigtyp.github.io/ws2025-mrl.html with Duygu Ataman Catherine Arnett Jiayi Wang Fabian David Schmidt Tyler Chang Hila Gonen and amazing speakers: Alice Oh, Kelly Marchisio, & Pontus Stenetorp

thumb_up_off_alt40

chat_bubble_outline2

repeat11

shareShare

Cohere Labs

@cohere_labs

4 months ago

Wei-Yin Ko was one of the earliest members of our Open Science Community and an early collaborator on our open science research. We’re proud to have been part of Wei-Yin’s journey from community collaborator to colleague, and grateful he took an early bet on working with us 🚀

thumb_up_off_alt13

chat_bubble_outline1

repeat3

shareShare

Kelly Marchisio (St. Denis)

AK

Sebastian Ruder

Kelly Marchisio (St. Denis)

Kelly Marchisio (St. Denis)

Kelly Marchisio (St. Denis)

Cohere Labs

Kelly Marchisio (St. Denis)

Piotr Nawrot

Cohere Labs

Cohere Labs

Edoardo Ponti

Piotr Nawrot

Kelly Marchisio (St. Denis)

Kelly Marchisio (St. Denis)

Matthias Gallé

Wei-Yin Ko

David Ifeoluwa Adelani 🇳🇬

Cohere Labs