Max Zhdanov (@maxxxzdn) 's Twitter Profile
Max Zhdanov

@maxxxzdn

busy scaling on a single GPU at @amlabuva with @wellingmax and @jwvdm

ID: 1502348291052392455

linkhttp://maxxxzdn.github.io calendar_today11-03-2022 18:19:00

384 Tweet

1,1K Followers

324 Following

Ji-Ha (@ji_ha_kim) 's Twitter Profile Photo

Blog post: Rethinking Probability - Mass, Averages, and Granularity Developing an intuition for probability using analogies from physics, exploring both the standard measure-theoretic and the expectation-first foundations.

Blog post: Rethinking Probability - Mass, Averages, and Granularity

Developing an intuition for probability using analogies from physics, exploring both the standard measure-theoretic and the expectation-first foundations.
Jean-Philip Piquemal (@jppiquem) 's Twitter Profile Photo

#compchem Second preprint linked to the FeNNix-Bio1 #machinelearning foundation model. FeNNix-Bio1's inference is pretty fast already with a few GPUs but, "what if", we were able to push it at the #Exascale? Let's have a glimpse into the future (1/3): "Pushing the Accuracy Limit

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS

After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque

So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS
Max Zhdanov (@maxxxzdn) 's Twitter Profile Photo

LLMs are a classic example of emergence, and it is not the first time we look into a non-living system showing life-like properties, Prigogine got a Nobel for that. I totally see Anthropic going this direction and getting another one.

Brandon Wood (@bwood_m) 's Twitter Profile Photo

We released the Open Molecules 2025 (OMol25) Dataset last week! 🚀🧪 OMol25 is a large (100M+) and diverse molecular DFT dataset for training machine learning models. It was a massive collaborative and interdisciplinary effort and I’m super proud of the whole team! 🙌 1/7

We released the Open Molecules 2025 (OMol25) Dataset last week! 🚀🧪 OMol25 is a large (100M+) and diverse molecular DFT dataset for training machine learning models. It was a massive collaborative and interdisciplinary effort and I’m super proud of the whole team! 🙌

1/7
Nabil Iqbal (@nblqbl) 's Twitter Profile Photo

The arxiv preprint on our conformally equivariant neural network -- named AdS-GNN due to its secret origins in AdS/CFT -- is now out! arxiv.org/abs/2505.12880 🧵explaining it below. Joint work with the amazing team of Max Zhdanov, Erik Bekkers and Patrick Forre.

The arxiv preprint on our conformally equivariant neural network -- named AdS-GNN due to its secret origins in AdS/CFT -- is now out!

arxiv.org/abs/2505.12880 

🧵explaining it below. Joint work with the amazing team of <a href="/maxxxzdn/">Max Zhdanov</a>, <a href="/erikjbekkers/">Erik Bekkers</a> and Patrick Forre.
Maurice Weiler (@maurice_weiler) 's Twitter Profile Photo

New preprint! We extend Taco Cohen's theory of equivariant CNNs on homogeneous spaces to the non-linear setting. Beyond convolutions, this covers equivariant attention, implicit kernel MLPs and more general message passing layers. More details in Oscar Carlsson's thread 👇

New preprint! We extend  <a href="/TacoCohen/">Taco Cohen</a>'s theory of equivariant CNNs on homogeneous spaces to the non-linear setting. Beyond convolutions, this covers equivariant attention, implicit kernel MLPs and more general message passing layers.
More details in <a href="/O_EA_Carlsson/">Oscar Carlsson</a>'s thread 👇
Katie Everett (@_katieeverett) 's Twitter Profile Photo

1. We often observe power laws between loss and compute: loss = a * flops ^ b + c 2. Models are rapidly becoming more efficient, i.e. use less compute to reach the same loss But: which innovations actually change the exponent in the power law (b) vs change only the constant (a)?

Gabriele Cesa (@_gabrielecesa_) 's Twitter Profile Photo

Excited to be giving a talk at the Cambridge Wednesday Seminar today at 3pm. Looking forward to sharing ideas and great discussion about equivariance and beyond Thanks Pietro Lio' Riccardo Ali for inviting me! cst.cam.ac.uk/seminars/list/…

Artem Moskalev @ at ICLR2025 🇸🇬 (@artemmoskalev) 's Twitter Profile Photo

ICML Spotlight 🚨 Equivariance is too slow and expensive, especially when you need global context. It makes us wonder if it even worths the cost, especially in high-dimensional problems? We present Geometric Hyena Networks — a simple equivariant model orders of magnitude more

ICML Spotlight 🚨 Equivariance is too slow and expensive, especially when you need global context. It makes us wonder if it even worths the cost, especially in high-dimensional problems? We present Geometric Hyena Networks — a simple equivariant model orders of magnitude more
Marco Fumero@ICLR25 (@marco_fumero) 's Twitter Profile Photo

Neural networks implicitly define a latent vector field on the data manifold, via autoencoding iterations🌀 This representation retains properties of the model, revealing memorization and generalization regimes, and characterizing distribution shifts 📜: arxiv.org/abs/2505.22785

Neural networks implicitly define a latent vector field on the data manifold, via autoencoding iterations🌀

This representation retains properties of the model, revealing memorization and generalization regimes, and characterizing distribution shifts

📜: arxiv.org/abs/2505.22785
Erik Bekkers (@erikjbekkers) 's Twitter Profile Photo

Great discussion, Chaitanya K. Joshi! We also explored this with extensive experiments in our recent paper: arxiv.org/abs/2501.01999. We find, among others, that equiv mods in a sense scale even better than non-equiv ones. Going more or less completely against the vibes from your post😅1/5