Alexander Long (@_alexanderlong) 's Twitter Profile
Alexander Long

@_alexanderlong

Founder @PluralisHQ | prev. Applied Scientist at Amazon | ML PhD |
Protocol Learning: Multi-participant, low-bandwidth model parallel.

ID: 1684127558563151873

linkhttp://Pluralis.ai calendar_today26-07-2023 09:05:01

289 Tweet

1,1K Followers

715 Following

7213 | Ejaaz (@cryptopunk7213) 's Twitter Profile Photo

for those of you who haven't been keeping up, Prime Intellect completed training a 32B AI model (100% decentralized) then gensyn announced they're training a 72B model the next day, more than 2X-ing the bar. but not before zuckerberg went on Dwarkesh Patel pod espousing how

Sofian Faiz (@sofianxyz) 's Twitter Profile Photo

A handful of centralized AI models will soon steer what you know, think, and choose. Alexander Long walked out of Amazon and pulled eight PhDs with him to stop that future by betting on decentralized AI and founding Pluralis Research.

Greg Osuri 🇺🇸 deAI Summer 2025 (@gregosuri) 's Twitter Profile Photo

Amazing to see the pipeline parallelism paper by Pluralis accepted into ICML. ICML is one of the biggest and most reputable AI conferences in the world, that will have major DeAI representation this year. DeAI summer will be epic.

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

Why Model Parallelism Matters in Decentralized AI A lot of prior work has focused on data parallelism (DP), and for good reason. It’s simple to scale, relatively well supported by infrastructure, and compatible with many gradient compression techniques. But there’s a critical

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

Probably biggest week in Decentralized Training to date off back of ICLR and more about to come out. Summary of situation as it stands today: 1. Decentralized RL post-training is clearly working. gensyn the latest with great results here. This process takes a strong base

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

“It will not be our frontier model… something like a generation behind". This seems pretty incompatible with the world view that the best open-weight models == best closed models.

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

Contrarian view: why is X doing this if base models will be commoditized and roughly equiv. to open weight versions? Don't need a terawatt to do inference. Fundamental question is are we going to perpetually have competitive open weight models? Answer is we're not.

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

My view has always been if you give every node a copy of the full model weights, its not clear how you can introduce rational economics to decentralized training (since anyone can take the model). If you solve model parallel, no-one sees a full weight set.

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

Saw a tweet the other day that said AI research is a max performance game. I think that’s right. Either you have the ability to solve the problem when you’re firing at your peak, or you can’t. You either clear the wall or you don’t. This was Sameera Ramasinghe's wall to clear. I

Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

Really impressive work on compressing activations & gradients for pipeline parallelism. * Constrains projection matrices (Wp1, Wp2) to shared low-rank subspace * 100× compression of activations & gradients

Sameera Ramasinghe (@sameeraramasin1) 's Twitter Profile Photo

In model parallel training, compressing the signal flow has been established as not very useful as they are too information-dense to be compressed without harming convergence. For instance, prior work (e.g., Rudakov et al. arxiv.org/pdf/2401.07788) demonstrated that compression

Jake Brukhman 🚀 deAI Summer 2025 (@jbrukh) 's Twitter Profile Photo

Here’s an accessible breakdown of Pluralis Research’s incredible paper. When we train large models on decentralized networks, the idea is to break them down into pieces and have different nodes process the different pieces. There are a few ways to do this. One way is low hanging

Alexander Long (@_alexanderlong) 's Twitter Profile Photo

In Dec 2023 I spent like 3 days non stop compulsively reading all of Max's papers. SWARM had 1 citation at that point. No one I talked to had ever heard of it. Decided to start Pluralis after reading. Only other time I ever felt like that was 2015 when I learned about RL.