harsha
@sree_harsha_n
Applied Scientist intern @ Amazon | efficient DL | MSc | prev @cvml_mpiinf, @cispa, @medialab . Community lead @CohereForAI (views my own.)
ID: 408563023
09-11-2011 15:54:16
2,2K Tweet
470 Followers
533 Following
🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨 Thrilled to share our latest work at Cohere Labs: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon! Details in 🧵 ⤵️
What if we rethought distributed AI training from the ground up for Apple Silicon? Tycho van der Ouderaa and Matt Beton present KPOP at Cohere Labs ML efficiency group. KPOP is an optimizer that leverages the high memory:FLOPS ratio on Apple Silicon. youtu.be/1DTSdYy2RcU?fe…
A deep dive on KPOP at Cohere Labs ML efficiency group. KPOP is an optimizer designed specifically for the hardware constraints of Apple Silicon. We're doubling the number of Apple Silicon macs that can train together coherently every 2 months. In 12 months we'll have rebuilt
PSA: Franz Srambical (not at neurips cuz no capacity) (who has capacity now) will be presenting at the ml-efficency group Cohere Labs :). Amazing work and excited to hear all about it, you should be too! September 10, 1600 GMT/1800 CEST.
My Cohere Labs talk is online. We outline research directions that embrace the bitter lesson, and state roadblocks on the path to AGI that need to be addressed even in a regime of absolute energy- and compute-abundance. youtube.com/watch?v=6wraMn…