
Tom Hosking
@tomhosking
Model merging lead for Command A @cohere. Prev: PhD student in NLP @EdinburghNLP @Edin_CDT_NLP, @BloomsburyAI @UCL @DRWTrading
ID: 30673001
http://tomho.sk 12-04-2009 16:16:43
1,1K Tweet
931 Followers
640 Following




I really enjoyed my Machine Learning Street Talk chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in some semi-coherent thoughts revolving around AI robustness, it may be worth


I'm excited to the tech report for our @Cohere Cohere For AI Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised











Now feels like a good time to plug cohere Command A: - model evaled on lmarena.ai is same as hosted on Hugging Face - claimed performance is reproducible - not trained on the test set - uses the cohere hybrid attention architecture for long context - fits on 2xH100 not 8x





How does sparse attention reshape LLM scaling? 🔍 We’re excited to share this work by former @Cohere intern Piotr Nawrot, “The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs.”