Kelly Marchisio (St. Denis) (@cheeesio) 's Twitter Profile
Kelly Marchisio (St. Denis)

@cheeesio

Multilingualilty Lead @cohere. Formerly: PhD @jhuclsp, Alexa Fellow @amazon, dev @Google, MPhil @cambridgenlp, EdM @hgse 🔑🔑¬🧀 (@kelvenmar20)

ID: 1134629909551308800

linkhttp://kellymarchisio.github.io calendar_today01-06-2019 01:17:05

658 Tweet

2,2K Followers

630 Following

Sebastian Ruder (@seb_ruder) 's Twitter Profile Photo

The Sparse Frontier Efficient sparse attention methods are key to scale LLMs to long contexts. We conduct the largest-scale empirical analysis that answers: 1. 🤏🔍 Are small dense models or large sparse models better? 2. ♾️What is the maximum permissible sparsity per task? 3.

The Sparse Frontier

Efficient sparse attention methods are key to scale LLMs to long contexts. We conduct the largest-scale empirical analysis that answers:
1. 🤏🔍 Are small dense models or large sparse models better?
2. ♾️What is the maximum permissible sparsity per task?
3.
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

Our ML Efficiency group is looking forward to welcoming Piotr Nawrot next week on May 28th, for a session on "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs" Learn more: cohere.com/events/Cohere-…

Our ML Efficiency group is looking forward to welcoming <a href="/p_nawrot/">Piotr Nawrot</a> next week on May 28th, for a session on "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs"

Learn more: cohere.com/events/Cohere-…
Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

Tomorrow at 6pm CET I'm giving a talk about our latest work on Sparse Attention, at Cohere Labs. I plan to describe the field as it is now, discuss our evaluation results, and share insights about what I believe is the future of Sparse Attention. See you!

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

Over 7000 languages are spoken worldwide 🌐, but AI safety efforts focus on only a fraction of them. Our latest paper draws on our multi-year efforts with the wider research community to explore why this matters and how we can bridge the AI language gap.

Over 7000 languages are spoken worldwide 🌐, but AI safety efforts focus on only a fraction of them. 

Our latest paper draws on our multi-year efforts with the wider research community to explore why this matters and how we can bridge the AI language gap.
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

Here are key recommendations to make AI safer & more equitable for everyone: 🌐 Incentivize the creation of open-access multilingual datasets 🪟 Encourage transparency in model language coverage 🔬 Prioritise resources towards multilingual research

Here are key recommendations to make AI safer &amp; more equitable for everyone:

🌐 Incentivize the creation of open-access multilingual datasets
🪟 Encourage transparency in model language coverage
🔬 Prioritise resources towards multilingual research
Edoardo Ponti (@pontiedoardo) 's Twitter Profile Photo

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. This unlocks *inference-time hyper-scaling* For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget. 

This unlocks *inference-time hyper-scaling*

For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!
Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

We release a major improvement upon last year's Dynamic Memory Compression. DMS is better, easier, and faster to train. Future of Long Context is 1) KV Cache Compression + 2) Sparse Attention, both training-aware to avoid training-inference mismatch. Imho, DMS is SOTA for 1).

Kelly Marchisio (St. Denis) (@cheeesio) 's Twitter Profile Photo

Code-release from our superstar intern, Piotr Nawrot! • Write sparse attn patterns in 50 lines, not 5k • Compatibility w models supported by vLLM, support for TP • 6 SOTA baselines with optimized implementations + 9 eval tasks • Research-grade extensibility = rapid prototyping

Kelly Marchisio (St. Denis) (@cheeesio) 's Twitter Profile Photo

The Multilingual Team at cohere is hiring! If this sounds like you, please apply: - strong coding skills and a keen eye for detail - experience working with the challenges & joys of multilingual data Help us bring AI to the world! 🌏🌍🌎 jobs.ashbyhq.com/cohere/a87be94…

Wei-Yin Ko (@weiyinko_ml) 's Twitter Profile Photo

We're looking for a new member for the multilingual team with a focus on data engineering! Please apply at the link below:

David Ifeoluwa Adelani 🇳🇬 (@davlanade) 's Twitter Profile Photo

Excited to announce the call for papers for the Multilingual Representation Learning workshop #EMNLP2025 sigtyp.github.io/ws2025-mrl.html with Duygu Ataman Catherine Arnett Jiayi Wang Fabian David Schmidt Tyler Chang Hila Gonen and amazing speakers: Alice Oh, Kelly Marchisio, & Pontus Stenetorp

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

Wei-Yin Ko was one of the earliest members of our Open Science Community and an early collaborator on our open science research. We’re proud to have been part of Wei-Yin’s journey from community collaborator to colleague, and grateful he took an early bet on working with us 🚀