Tom Hosking (@tomhosking) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Matthias Gallé

@mgalle

4 months ago

outperforming deepseek in 6/10 categories... while being x6 smaller

thumb_up_off_alt44

chat_bubble_outline0

repeat7

shareShare

Nick Frosst

@nickfrosst

4 months ago

I added cohere command A to this chart, I had to extend the axis a bit though….

I added <a href="/cohere/">cohere</a> command A to this chart, I had to extend the axis a bit though….

thumb_up_off_alt693

chat_bubble_outline33

repeat47

shareShare

Nick Frosst

@nickfrosst

4 months ago

UPDATE: my numbers were off, external benchmarking actually shows we are faster and better. GPQA-diamond: 53% miliseconds per token: 5.36 artificialanalysis.ai/providers/cohe…

thumb_up_off_alt151

chat_bubble_outline10

repeat12

shareShare

I really enjoyed my Machine Learning Street Talk chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in some semi-coherent thoughts revolving around AI robustness, it may be worth

I really enjoyed my <a href="/MLStreetTalk/">Machine Learning Street Talk</a> chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in some semi-coherent thoughts revolving around AI robustness, it may be worth

thumb_up_off_alt67

chat_bubble_outline3

repeat18

shareShare

Max Bartolo

@max_nlp

4 months ago

I'm excited to the tech report for our @Cohere Cohere For AI Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised

I'm excited to the tech report for our @Cohere <a href="/CohereForAI/">Cohere For AI</a> Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised

thumb_up_off_alt278

chat_bubble_outline9

repeat76

shareShare

Viraat Aryabumi

@viraataryabumi

4 months ago

Merging 🍇 + polishing 🧽 = ⌘🧑🏼‍🍳

thumb_up_off_alt22

chat_bubble_outline0

repeat4

shareShare

Tom Hosking

@tomhosking

4 months ago

I'm really proud to have led the model merging work that went into cohere Command A and R7B, all made possible by an amazing group of collaborators. Check out the report for loads of details on how we trained a GPT-4o level model that fits on 2xH100!

thumb_up_off_alt57

chat_bubble_outline0

repeat3

shareShare

Cohere Labs

@cohere_labs

4 months ago

Following the open-weight release of Command A and Command R7B models, we're excited to have collaborated with @Cohere colleagues on a tech report highlighting our novel approach to model training, including self-refinement algorithms and model merging techniques at scale.

thumb_up_off_alt71

chat_bubble_outline1

repeat16

shareShare

cohere

@cohere

4 months ago

We’re redefining what’s possible with AI. With the release of our latest model, Command A, optimized for real-world agentic and multilingual tasks, we’re demonstrating our commitment to bringing enterprises AI that goes beyond the ordinary, and offers security & efficiency.

thumb_up_off_alt116

chat_bubble_outline9

repeat29

shareShare

aakanksha

@____aakanksha

4 months ago

the complete cooking guide with all the ingredients, seasonings and garnishes for this soup of a model is here! 🍲🧂🌶️🔥 couldn’t be more proud to have continued my exploration of model merging and translated it to our A-class flagship model, Command A, with the best team! ✨

thumb_up_off_alt44

chat_bubble_outline0

repeat3

shareShare

wh

@nrehiew_

4 months ago

The most interesting part of their post training is just how much they use model merging both in SFT and RL. Their process is: - Train an instruct model - Train 6 SFT in 6 domains(Code, Safety, RAG, Math, Multilingual, and General Long-Context) - Merge - Use this merge to train

thumb_up_off_alt141

chat_bubble_outline3

repeat15

shareShare

wh

@nrehiew_

4 months ago

The next section on Merging is the most interesting imo. As a summary of what we discussed earlier, they used expert merging: merge the SFT then merge the preference tuned models

thumb_up_off_alt21

chat_bubble_outline1

repeat4

shareShare

wh

@nrehiew_

4 months ago

They find that linear merging is pretty interpretable ie upweight an expert leads to better performance in that domain. However, the corresponding drop in performance is unpredictable Interestingly, they add cross-domain data for each expert as a form of regularization.

thumb_up_off_alt19

chat_bubble_outline2

repeat4

shareShare

wh

@nrehiew_

4 months ago

Some thoughts: First, the paper is pretty well-written and easy to follow. They have so many benchmarks and results. I think its similar in spirit to the llama3 paper but they are more complementary imo as this paper focuses heavily on post training while Llama3 didn't do a good

thumb_up_off_alt59

chat_bubble_outline6

repeat9

shareShare

Tom Hosking

@tomhosking

4 months ago

Now feels like a good time to plug cohere Command A: - model evaled on lmarena.ai is same as hosted on Hugging Face - claimed performance is reproducible - not trained on the test set - uses the cohere hybrid attention architecture for long context - fits on 2xH100 not 8x

thumb_up_off_alt66

chat_bubble_outline1

repeat7

shareShare

Douwe Kiela

@douwekiela

4 months ago

When we came up with RAG five years ago, we weren't creating a workaround for small context windows—we were designing a principled approach to augment models with external knowledge. The core challenges RAG addresses remain unsolved with just larger context windows: • Accessing

thumb_up_off_alt17

chat_bubble_outline2

repeat4

shareShare

Matthias Gallé

@mgalle

4 months ago

You like Code, you like LLMs, you are looking for a leadership position? We are searching for somebody who can support our amazing team and bring code agents for enterprises to new heights! jobs.ashbyhq.com/cohere/758e544…

thumb_up_off_alt76

chat_bubble_outline1

repeat10

shareShare

Arduin Findeis @ ICLR2025

@arduinfindeis

3 months ago

How exactly was the initial Chatbot Arena version of Llama 4 Maverick different from the public HuggingFace version?🕵️ I used our Feedback Forensics app to quantitatively analyse how exactly these two models differ. An overview…👇🧵

thumb_up_off_alt23

chat_bubble_outline3

repeat6

shareShare

Maxime Voisin

@maximevoisin_ai

3 months ago

TIL cohere's best LLM (Command A) is higher than Anthropic's best LLM on the Arena

thumb_up_off_alt32

chat_bubble_outline0

repeat3

shareShare

Cohere Labs

@cohere_labs

3 months ago

How does sparse attention reshape LLM scaling? 🔍 We’re excited to share this work by former @Cohere intern Piotr Nawrot, “The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs.”

thumb_up_off_alt29

chat_bubble_outline1

repeat8

shareShare