Sameera Ramasinghe (@sameeraramasin1) 's Twitter Profile
Sameera Ramasinghe

@sameeraramasin1

ID: 1289903657899552769

calendar_today02-08-2020 12:39:43

34 Tweet

120 Followers

24 Following

Casey Flint (@flintcasey) 's Twitter Profile Photo

A team to watch! I’m sure there will be people in AI who dismiss this either because they don’t believe this style of training is possible at scale or because of the crypto aspect, but all I can say rn is that would be a mistake… :)

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Let’s take AI predictions from blog posts, podcasts and tweets and move them to betting markets, our state of the art in truth. My struggle has been coming up with good, concrete, resolvable predicates. Ideally, predicates related to industry metrics and macroeconomics. Eg

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

The holy grail of decentralized AI is Model-Parallel training over low-bandwidth interconnects. This is what lets you pool small devices over the internet to train giant models. This is all Pluralis cares about.

Pluralis Research (@pluralishq) 's Twitter Profile Photo

The last of our three ICLR workshop papers: Compression in pipeline parallel training has struggled to go beyond 10% compression without hurting model performance. We get 90%. blog.pluralis.ai/p/beyond-top-k…

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

Probably biggest week in Decentralized Training to date off back of ICLR and more about to come out. Summary of situation as it stands today: 1. Decentralized RL post-training is clearly working. gensyn the latest with great results here. This process takes a strong base

Sam Lehman (@splehman) 's Twitter Profile Photo

Two things in DeAI are crystalizing: 1) the infrastructure and techniques being developed are working and 2) industry insiders believe that decentralized AI is worth spending time on. Training runs are scaling up, papers are being accepted into leading conferences, and

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

When I started Pluralis I thought it was going to take 2-3 years and several papers to get to this point. I was considered borderline delusional to hold that view. Well: computational graph itself is split over nodes. Nodes are physically in different continents. 8B model. No

Sameera Ramasinghe (@sameeraramasin1) 's Twitter Profile Photo

In model parallel training, compressing the signal flow has been established as not very useful as they are too information-dense to be compressed without harming convergence. For instance, prior work (e.g., Rudakov et al. arxiv.org/pdf/2401.07788) demonstrated that compression

Teng Yan - Championing Crypto AI (@0xprismatic) 's Twitter Profile Photo

Just released a detailed deep dive on decentralized training. We cover a lot in there, but a quick brain dump while my thoughts are fresh: So much has happened in the past 3 months and it's hard not to get excited - Nous Research pre-trained a 15B model in a distributed fashion

Just released a detailed deep dive on decentralized training. We cover a lot in there, but a quick brain dump while my thoughts are fresh:

So much has happened in the past 3 months and it's hard not to get excited
- <a href="/NousResearch/">Nous Research</a> pre-trained a 15B model in a distributed fashion
Lucas 🛡️ (@onchainlu) 's Twitter Profile Photo

decentralized training generally falls into two buckets -- i think 1 of them has a better chance at beating big tech bucket 1: data-parallel everyone gets a full copy of the model, trains on different data, and shares learnings. this has major benefits: well-understood, low

Jake Brukhman 🚀 deAI Summer 2025 (@jbrukh) 's Twitter Profile Photo

Here’s an accessible breakdown of Pluralis Research’s incredible paper. When we train large models on decentralized networks, the idea is to break them down into pieces and have different nodes process the different pieces. There are a few ways to do this. One way is low hanging

Yash Gandhi (@hash_dd) 's Twitter Profile Photo

Alexander Long becoming the Vitalik of deAI, quietly solving the Decentralised Training Trilemma: - Scaling: Balancing the network's ability to scale with its performance and efficiency. - Model Efficiency: Ensuring the model runs effectively across diverse hardware setups,

<a href="/_AlexanderLong/">Alexander Long</a> becoming the Vitalik of deAI, quietly solving the Decentralised Training Trilemma:

- Scaling: Balancing the network's ability to scale with its performance and efficiency.

- Model Efficiency: Ensuring the model runs effectively across diverse hardware setups,
Chamin Hewa Koneputugodage (@chaminhewa) 's Twitter Profile Photo

📢Come check out our poster (#270) on Variance Informed Initialization today (Saturday morning / Poster Session 3) 📢 We generalize Xavier/Kaiming initialization for any activation function! #CVPR2025 Yizhak Ben-Shabat (Itzik) 💔 Sameera Ramasinghe Stephen Gould 📄Project Page: chumbyte.github.io/vi3nr-site/

📢Come check out our poster (#270) on Variance Informed Initialization today (Saturday morning / Poster Session 3) 📢

We generalize Xavier/Kaiming initialization for any activation function!

#CVPR2025 <a href="/sitzikbs/">Yizhak Ben-Shabat (Itzik) 💔</a> <a href="/SameeraRamasin1/">Sameera Ramasinghe</a> <a href="/sgould_au/">Stephen Gould</a>
📄Project Page: chumbyte.github.io/vi3nr-site/
Alexander Long (@_alexanderlong) 's Twitter Profile Photo

Using beautiful Grafana dashboards for everything internally, so much nicer than Tensorboard. Wandb still good but doesn't really work with decentralised training. Makes me wonder what the internal vis tooling is like in openai - must be incredible.

Using beautiful Grafana dashboards for everything internally, so much nicer than Tensorboard. Wandb still good but doesn't really work with decentralised training.  Makes me wonder what the internal vis tooling is like in openai - must be incredible.
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly

Pluralis Research (@pluralishq) 's Twitter Profile Photo

Pluralis has a main tack paper at ICML this week and the team is in Vancouver running several events. The best will be the Open Source Mixer we're running with gensyn and Akash Network Thursday night. For anyone interested in decentralised training it should be a great evening.

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

For people not familiar with AI publishing; there are 3 main conferences every year. ICML, ICLR and NeurIPS. These are technical conferences and the equivalent of journals in other disciplines - they are the main publishing venue for AI. The competition to have papers at these