Maximilian Mozes (@maximilianmozes) 's Twitter Profile
Maximilian Mozes

@maximilianmozes

Senior Research Scientist @cohere. PhD @UCL/@ucl_nlp. Previously: @GoogleAI/@SpotifyResearch. He/Him.

ID: 1521245972361363458

linkhttps://mmozes.net calendar_today02-05-2022 21:51:44

181 Tweet

259 Followers

575 Following

Max Bartolo (@max_nlp) 's Twitter Profile Photo

I really enjoyed my Machine Learning Street Talk chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in some semi-coherent thoughts revolving around AI robustness, it may be worth

I really enjoyed my <a href="/MLStreetTalk/">Machine Learning Street Talk</a> chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in some semi-coherent thoughts revolving around AI robustness, it may be worth
Kyle Duffy (@kyduffy) 's Twitter Profile Photo

My team recently launched a best-in-class LLM specializing in English and Arabic. We just published a tech report explaining our methods. Check it out on arxiv: arxiv.org/abs/2503.14603

Max Bartolo (@max_nlp) 's Twitter Profile Photo

I'm excited to the tech report for our @Cohere Cohere For AI Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised

I'm excited to the tech report for our @Cohere <a href="/CohereForAI/">Cohere For AI</a> Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised
ICLR 2025 (@iclr_conf) 's Twitter Profile Photo

Announcing the keynote speakers for #ICLR2025! Speakers will cover topics ranging from foundational advances in language models, AI safety, open-ended learning, and the nature of intelligence itself. blog.iclr.cc/2025/04/11/ann…

Announcing the keynote speakers for #ICLR2025! Speakers will cover topics ranging from foundational advances in language models, AI safety, open-ended learning, and the nature of intelligence itself. blog.iclr.cc/2025/04/11/ann…
Nick Frosst (@nickfrosst) 's Twitter Profile Photo

Today we are releasing Embed 4 – the new SOTA foundation for agentic enterprise search and retrieval applications! cohere.com/blog/embed-4 Check out the blog for similarly visually satisfying graphs :)

Today we are releasing Embed 4 – the new SOTA foundation for agentic enterprise search and retrieval applications! 

cohere.com/blog/embed-4 

Check out the blog for similarly visually satisfying graphs :)
Nils Reimers (@nils_reimers) 's Twitter Profile Photo

𝐂𝐨𝐡𝐞𝐫𝐞 𝐄𝐦𝐛𝐞𝐝 𝐯𝟒 - 𝐒𝐭𝐚𝐭𝐞-𝐨𝐟-𝐭𝐡𝐞-𝐚𝐫𝐭 𝐭𝐞𝐱𝐭 & 𝐢𝐦𝐚𝐠𝐞 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 Today we are releasing Embed v4, unlocking so many cool new features for retrieval. 🇺🇳 100+ languages 🖼️ Text & Image capabilities 📜 128k context length

Eugene Choi (@221eugene) 's Twitter Profile Photo

Attending #ICLR2025 and interested in #LLM, #Alignment, or #SelfImprovement? Then come by and check out our work from cohere: "Self-Improving Robust Preference Optimization" - a new alignment method that unlocks self-refinement in LLMs! 📍 Poster Session 4 — Friday, 3–5:30 PM

Attending #ICLR2025 and interested in #LLM, #Alignment, or #SelfImprovement?

Then come by and check out our work from 
<a href="/cohere/">cohere</a>: "Self-Improving Robust Preference Optimization" - a new alignment method that unlocks self-refinement in LLMs!
📍 Poster Session 4 — Friday, 3–5:30 PM
Sara Hooker (@sarahookr) 's Twitter Profile Photo

Very proud of this work which is being presented ICLR 2026 later today. While I will not be there — Catch up with Viraat Aryabumi and Ahmet Üstün who are both fantastic and can share more about our work at both Cohere Labs and cohere. 🔥✨

Very proud of this work which is being presented <a href="/iclr_conf/">ICLR 2026</a> later today. While I will not be there — Catch up with <a href="/viraataryabumi/">Viraat Aryabumi</a> and <a href="/ahmetustun89/">Ahmet Üstün</a> who are both fantastic and can share more about our work at both <a href="/Cohere_Labs/">Cohere Labs</a> and <a href="/cohere/">cohere</a>. 🔥✨
Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs. We performed the most comprehensive study on training-free sparse attention to date. Here is what we found:

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs.

We performed the most comprehensive study on training-free sparse attention to date.

Here is what we found:
Sara Hooker (@sarahookr) 's Twitter Profile Photo

It is critical for scientific integrity that we trust our measure of progress. The lmarena.ai has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on lmarena.ai, despite best intentions.

It is critical for scientific integrity that we trust our measure of progress. 

The <a href="/lmarena_ai/">lmarena.ai</a> has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on <a href="/lmarena_ai/">lmarena.ai</a>, despite best intentions.
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

How do we build multimodal systems that work effectively across the globe? 🌍 Today we release the Aya Vision Technical Report, the detailed recipe behind Aya Vision models, unifying state-of-the-art multilingual capabilities in multimodal and text tasks across 23 languages!

Max Bartolo (@max_nlp) 's Twitter Profile Photo

Can LLMs be incentivised to generate token sequences (in this case preambles) that condition downstream models to improve performance when judged by reward models? Yes! ✅

Jon Ander Campos (@jaa_campos) 's Twitter Profile Photo

We froze an LLM ❄️, trained a system prompt generator using RL for conditioning it and got pretty cool results! This new work by Lisa Alazraki demonstrates that optimizing the system prompt alone can enhance downstream performance without updating the original model.

cohere (@cohere) 's Twitter Profile Photo

We’re proud to partner with the governments of Canada and the UK to accelerate adoption of secure AI solutions in the public sector. Today, our CEO and co-founder @AidanGomez met with Prime Minister of Canada and UK Prime Minister to discuss the strategic importance of AI for national

We’re proud to partner with the governments of Canada and the UK to accelerate adoption of secure AI solutions in the public sector.

Today, our CEO and co-founder @AidanGomez met with <a href="/CanadianPM/">Prime Minister of Canada</a> and <a href="/10DowningStreet/">UK Prime Minister</a> to discuss the strategic importance of AI for national
Daniel D'souza  (@mrdanieldsouza) 's Twitter Profile Photo

🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨 Thrilled to share our latest work at Cohere Labs: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon! Details in 🧵 ⤵️

🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨

Thrilled to share our latest work at <a href="/Cohere_Labs/">Cohere Labs</a>: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon!

Details in 🧵 ⤵️
Maximilian Mozes (@maximilianmozes) 's Twitter Profile Photo

We’re looking for a Research Engineer / Scientist with a focus on Data Analysis and Evaluation to join the post-training team at Cohere! More details and application here: jobs.ashbyhq.com/cohere/6170371… Feel free to reach out if you'd like to know more!