Kris Cao (@kroscoo) 's Twitter Profile
Kris Cao

@kroscoo

When lava pours out near the sea's surface tremendous volcanic explosions sometimes occur | pretraining @cohere

ID: 732544572723695616

linkhttp://kriscao.github.io calendar_today17-05-2016 12:13:30

1,1K Tweet

1,1K Followers

704 Following

Kris Cao (@kroscoo) 's Twitter Profile Photo

Felix is directly responsible for me getting into NLP by responding to a cold email from a clueless maths grad and giving me a summer project. RIP Felix, every subsequent opportunity I’ve had has been downstream of that one small act.

Command A(idan) (@aidangomez) 's Twitter Profile Photo

I’m so excited to share something we’ve been working on for awhile: North is cohere’s AI workspace for enterprises. Today we’re releasing the platform for early access!

I’m so excited to share something we’ve been working on for awhile: North is <a href="/cohere/">cohere</a>’s AI workspace for enterprises. Today we’re releasing the platform for early access!
Kris Cao (@kroscoo) 's Twitter Profile Photo

As someone who briefly touched the transcendent (applied for PhDs in axiomatic set theory) this resonates strongly with me. I think that ‘genius’ is as much the effort of self-cultivation as it is birth, and that the route to mastery in any subject is surprisingly similar.

Ollie (@ollie575563753) 's Twitter Profile Photo

cohere is growing - We're hiring MLE's to build North in our London, New York and Toronto offices. We also support remote working, 'cos RTO mandates aren't our thing. Job Spec is below in the 🧵, with more info on North available here; cohere.com/north

Robert Yang (@roberty970316) 's Twitter Profile Photo

📷 Excited to share our new paper: "Rope to Nope and Back Again: A New Hybrid Attention Strategy" where we propose a novel architecture that outperforms RoPE-NTK-based approaches with full attention span. (1/8)

Kris Cao (@kroscoo) 's Twitter Profile Photo

once again the function-space view of neural networks leads to actionable insights. gaussian processes should (once again) be required knowledge for ML.

Acyr Locatelli (@acyr_l) 's Twitter Profile Photo

I'm hiring performance engineers for the pre-training team at Cohere. If you enjoy writing efficient kernels, hardware-aligned architecture design and optimisations, do reach out! Check out the live job posting here: jobs.ashbyhq.com/cohere/d42f5fd…

Michael Hu (@michahu8) 's Twitter Profile Photo

Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇

Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient!

How and why does this work? The answer lies…Between Circuits and Chomsky.

🧵1/6👇
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

Introducing ✨ Aya Vision ✨ - an open-weights model to connect our world through language and vision Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. 🌿

Kris Cao (@kroscoo) 's Twitter Profile Photo

we have a new model, it's pretty good and we like it, we think you'll like it too. (as an aside this is the first model i contributed to at cohere!)

Kyle Duffy (@kyduffy) 's Twitter Profile Photo

My team recently launched a best-in-class LLM specializing in English and Arabic. We just published a tech report explaining our methods. Check it out on arxiv: arxiv.org/abs/2503.14603

Max Bartolo (@max_nlp) 's Twitter Profile Photo

I'm excited to the tech report for our @Cohere Cohere For AI Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised

I'm excited to the tech report for our @Cohere <a href="/CohereForAI/">Cohere For AI</a> Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised
omer goldman (@omernlp) 's Twitter Profile Photo

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩

But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

How can we make language models more flexible to adapt to new languages after pretraining? 🌏 🧠 Our latest work investigates whether a tokenizer trained on more languages than the pretraining target can improve language plasticity without compromising pretraining performance.

How can we make language models more flexible to adapt to new languages after pretraining? 🌏

🧠 Our latest work investigates whether a tokenizer trained on more languages than the pretraining target can improve language plasticity without compromising pretraining performance.
Diana Abagyan (@dianaabagyan) 's Twitter Profile Photo

🚨New pretraining paper on multilingual tokenizers 🚨 Super excited to share my work with Cohere Labs: One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers

🚨New pretraining paper on multilingual tokenizers 🚨

Super excited to share my work with <a href="/Cohere_Labs/">Cohere Labs</a>: One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers