
Sameera Ramasinghe
@sameeraramasin1
ID: 1289903657899552769
02-08-2020 12:39:43
34 Tweet
120 Followers
24 Following











Just released a detailed deep dive on decentralized training. We cover a lot in there, but a quick brain dump while my thoughts are fresh: So much has happened in the past 3 months and it's hard not to get excited - Nous Research pre-trained a 15B model in a distributed fashion




Here’s an accessible breakdown of Pluralis Research’s incredible paper. When we train large models on decentralized networks, the idea is to break them down into pieces and have different nodes process the different pieces. There are a few ways to do this. One way is low hanging

Alexander Long becoming the Vitalik of deAI, quietly solving the Decentralised Training Trilemma: - Scaling: Balancing the network's ability to scale with its performance and efficiency. - Model Efficiency: Ensuring the model runs effectively across diverse hardware setups,


📢Come check out our poster (#270) on Variance Informed Initialization today (Saturday morning / Poster Session 3) 📢 We generalize Xavier/Kaiming initialization for any activation function! #CVPR2025 Yizhak Ben-Shabat (Itzik) 💔 Sameera Ramasinghe Stephen Gould 📄Project Page: chumbyte.github.io/vi3nr-site/




