Alireza Makhzani (@alimakhzani) 's Twitter Profile
Alireza Makhzani

@alimakhzani

Research Scientist at @GoogleDeepMind, Associate Professor (status-only) @UofT

ID: 1276295754

linkhttp://alireza.ai calendar_today17-03-2013 23:51:05

96 Tweet

2,2K Followers

963 Following

Hannes StÀrk (@hannesstaerk) 's Twitter Profile Photo

Monday in the reading group - flow matching? neigh: "Action Matching: Learning Stochastic Dynamics from Samples" arxiv.org/abs/2210.06662 with Kirill Neklyudov and Alireza Makhzani! One of the most interesting ICML papers. 👌 On Zoom at 11am EDT / 3pm UTC: m2d2.io/talks/logg/abo


Monday in the reading group - flow matching? neigh: "Action Matching: Learning Stochastic Dynamics from Samples" arxiv.org/abs/2210.06662 with  <a href="/k_neklyudov/">Kirill Neklyudov</a> and <a href="/AliMakhzani/">Alireza Makhzani</a>!  One of the most interesting ICML papers. 👌

On Zoom at 11am EDT / 3pm UTC: m2d2.io/talks/logg/abo

Alireza Makhzani (@alimakhzani) 's Twitter Profile Photo

Introducing “Wasserstein Lagrangian Flows”: A novel computational approach for solving Optimal Transport and its variants. Paper: arxiv.org/abs/2310.10649 Led by Kirill Neklyudov and Rob Brekelmans With: Alex Tong Lazar Atanackovic Qiang Liu The solution of Optimal Transport (OT) and

Introducing “Wasserstein Lagrangian Flows”: A novel computational approach for solving Optimal Transport and its variants.

Paper: arxiv.org/abs/2310.10649
Led by <a href="/k_neklyudov/">Kirill Neklyudov</a> and <a href="/brekelmaniac/">Rob Brekelmans</a>
With: <a href="/AlexanderTong7/">Alex Tong</a> <a href="/lazar_atan/">Lazar Atanackovic</a> <a href="/lqiang67/">Qiang Liu</a>

The solution of Optimal Transport (OT) and
Alireza Makhzani (@alimakhzani) 's Twitter Profile Photo

Check out Rob Brekelmans's thread comparing Action Matching and its extension, Wasserstein Lagrangian flows, with Flow / Bridge Matching and their extensions.

Alireza Makhzani (@alimakhzani) 's Twitter Profile Photo

Yibo interned with me last summer at Vector, and was exceptional! Don't miss the chance to meet and hire him at #NeurIPS2023!

Wu Lin (@linyorker) 's Twitter Profile Photo

For the first time, we (with Felix Dangel, Runa Eschenhagen, Kirill Neklyudov Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW. also @NeurIPS workshop on Opt for ML arxiv.org/abs/2312.05705 /1

For the first time, we (with <a href="/f_dangel/">Felix Dangel</a>, <a href="/runame_/">Runa Eschenhagen</a>, <a href="/k_neklyudov/">Kirill Neklyudov</a> <a href="/akristiadi7/">Agustinus Kristiadi</a>, Richard E. Turner, <a href="/AliMakhzani/">Alireza Makhzani</a>) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW.  also @NeurIPS workshop on Opt for ML arxiv.org/abs/2312.05705 /1
Agustinus Kristiadi (@akristiadi7) 's Twitter Profile Photo

Large NNs like transformers (i) need fp16 to train => matrix inversion in 2nd order methods is unstable, (ii) expensive to store the preconditioner đŸ˜© Our work solves both by exploiting the Riemannian geometry of preconditioning matrices---it's as efficient as AdamW! 🌐

Alireza Makhzani (@alimakhzani) 's Twitter Profile Photo

I'm excited to be in New Orleans for #NeurIPS2023! Looking forward to catching up with old friends and meeting new folks. My group will be presenting [Spotlight] Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation

Alireza Makhzani (@alimakhzani) 's Twitter Profile Photo

It was very fun to present the "Wasserstein Quantum Monte Carlo" poster, next to Max Welling at #NeurIPS2023. This work was led by my exceptional postdoc Kirill Neklyudov, who unfortunately couldn't attend the conference.

It was very fun to present the "Wasserstein Quantum Monte Carlo" poster, next to <a href="/wellingmax/">Max Welling</a> at #NeurIPS2023. This work was led by my exceptional postdoc <a href="/k_neklyudov/">Kirill Neklyudov</a>, who unfortunately couldn't attend the conference.
Daniel Severo (@_dsevero) 's Twitter Profile Photo

In the next few weeks I'll be wrapping up my PhD and joining FAIR AI at Meta full-time in MontrĂ©al 🇹🇩! Looking forward to contributing to the AI space through open-source research. Very grateful to all who helped me get here. It truly does take a village to advise a PhD student!

Kirill Neklyudov (@k_neklyudov) 's Twitter Profile Photo

Je vais à Montréal! This June I'm starting a new position as an assistant professor at Université de Montréal and as a core academic member of Mila - Institut québécois d'IA. Drop me a line if you're interested in working together on problems in AI4Science, Optimal Transport, and Generative Modeling.

Alireza Makhzani (@alimakhzani) 's Twitter Profile Photo

Introducing “Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo” Many capability and safety techniques of LLMs—such as RLHF, automated red-teaming, prompt engineering, and infilling—can be viewed from a probabilistic inference perspective, specifically

Introducing “Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo”

Many capability and safety techniques of LLMs—such as RLHF, automated red-teaming, prompt engineering, and infilling—can be viewed from a probabilistic inference perspective, specifically
Alireza Makhzani (@alimakhzani) 's Twitter Profile Photo

Very cool to see OpenAI is using "k-Sparse Autoencoders" (my ICLR 2014 paper) to extract interpretable features from GPT-4, and showing that it outperforms other methods on sparsity-reconstruction frontier: arxiv.org/abs/1312.5663 If you are interested in sparse autoencoders,

Kirill Neklyudov (@k_neklyudov) 's Twitter Profile Photo

Wasserstein Lagrangian Flows explain many different dynamics on the space of distributions from a single perspective. arxiv.org/abs/2310.10649 I made a video explaining our (with Rob Brekelmans) #icml2024 paper about WLF. Like subscribe share, lol. youtu.be/kkddiLegc3s?si


Wu Lin (@linyorker) 's Twitter Profile Photo

#ICML2024 Can We Remove the Square-Root in Adaptive Methods? arxiv.org/abs/2402.03496 Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW) Removing the root makes matrix methods faster: Root-free Shampoo in BFloat16 /1

#ICML2024
Can We Remove the Square-Root in Adaptive Methods?
arxiv.org/abs/2402.03496
Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW)

Removing the root makes matrix methods faster:  Root-free Shampoo in BFloat16  /1
Hannes StÀrk (@hannesstaerk) 's Twitter Profile Photo

Come discuss an ICML Conference best paper award with the author Rob Brekelmans in our reading group on Monday! "Probabilistic Inference in LMs via Twisted Sequential Monte Carlo" arxiv.org/abs/2404.17546 On zoom Mon 9am PT / 12pm ET / 5pm CEST. Links: portal.valencelabs.com/logg

Come discuss an <a href="/icmlconf/">ICML Conference</a> best paper award with the author <a href="/brekelmaniac/">Rob Brekelmans</a> in our reading group on Monday!
"Probabilistic Inference in LMs via Twisted Sequential Monte Carlo" arxiv.org/abs/2404.17546

On zoom Mon 9am PT / 12pm ET / 5pm CEST. Links: portal.valencelabs.com/logg
David Pfau (@pfau) 's Twitter Profile Photo

In some sense, there’s nothing in this paper that we couldn’t have done in 2018 (and I wish we had! I’d be famous!) But the inspiration for this paper actually came from the fantastic recent work on Wasserstein QMC by Kirill Neklyudov and others. Good research should be timeless.

In some sense, there’s nothing in this paper that we couldn’t have done in 2018 (and I wish we had! I’d be famous!) But the inspiration for this paper actually came from the fantastic recent work on Wasserstein QMC by
<a href="/k_neklyudov/">Kirill Neklyudov</a> and others. Good research should be timeless.