Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile
Elvis Dohmatob

@dohmatobelvis

Professor of CSC at @Concordia (CRC chair) & @Mila_Quebec. Visiting prof @AIatMeta. Previously @AIatMeta, @criteo, @inria. Interested in the principles of ML.

ID: 576412169

linkhttps://www.concordia.ca/faculty/elvis-dohmatob.html calendar_today10-05-2012 17:24:45

2,2K Tweet

3,3K Followers

530 Following

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

Job Alert: Are you thinking of doing a PhD in ML, with a theoretical/algorithmic flavor? I'm hiring talented and passionate students to work with me on: ML theory; Neural scaling laws; Synthetic data (the good, the bad and the ugly); Explainable and trustworth AI (adversarial

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

We refused to cite the paper due to severe misconduct of the authors of that paper: plagiarism of our own prior work, predominantly AI-generated content (ya, the authors plugged our paper into an LLM and generated another paper), IRB violations, etc. Revealed during a long

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

1) New preprint! arxiv.org/abs/2504.10754 Most of ML theory (i.e fine-grained analysis to explain / reveal interesting phenomenon) can be automated, e.g via free probability theory. In our recent work with Arjun Subramonian, we provide a small lightweight tool to do just this

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

I'll also be giving an invited talk at the associative memory workshop tomorrow Saturday at 9:30 nfam.vizhub.ai/schedule/, on provable scaling laws for a toy LLM mirrored around associative memories

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

Our work on beating scaling laws via deliberate practice, a modified synthetic data-generation scheme wherein harder / more entropic examples are favored, has been accepted at ICML Conference #ICML2025.

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

[Bored] I came across YT video where someone said the probability that a needle of length 1 unit, dropped on an infinitely wide piece of paper with parallel lines uniformly spaced 1 unit apart is 2/pi. Here is my quick and dirty proof. Unfortunately, it makes use of calculus :/

[Bored] I came across YT video where someone said the probability that a needle of length 1 unit, dropped on an infinitely wide piece of paper with parallel lines uniformly spaced 1 unit apart is 2/pi. Here is my quick and dirty proof. Unfortunately, it makes use of calculus :/
Yizhou Liu (@yizhouliu0) 's Twitter Profile Photo

Superposition means that models represent more features than dimensions they have, which is true for LLMs since there are too many things to represent in language. We find that superposition leads to a power-law loss with width, leading to the observed neural scaling law. (1/n)

Superposition means that models represent more features than dimensions they have, which is true for LLMs since there are too many things to represent in language. We find that superposition leads to a power-law loss with width, leading to the observed neural scaling law. (1/n)
Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

Stein's Lemma: E[f'(x)] = E[f(x)x] for x ~ N(0,1). Corollary (Gaussian integration by parts): E[g'(x)h(x)] = E[g(x)(xh(x) - h'(x))]. Proof. Take f(x):=g(x)*h(x) and observe that f'(x)=g(x)h'(x)+g'(x)h(x). We deduce: E[g(x)h'(x)+g'(x)h(x)]=E[g(x)h(x)x] and the result follows. QED

Simon Willison (@simonw) 's Twitter Profile Photo

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta

Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!
Mathieu Blondel (@mblondel_ml) 's Twitter Profile Photo

Back from MLSS Senegal πŸ‡ΈπŸ‡³, where I had the honor of giving lectures on differentiable programming. Really grateful for all the amazing people I got to meet πŸ™ My slides are here github.com/diffprog/slide…

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

Mathematical foundations of neural scaling laws: Here are slides for lectures I gave at the the recent MLSS 2025 summer school in Dakar. Every LLM theorist and practitioner should know these things! The summer school: mlss-senegal.github.io My slides: drive.google.com/drive/folders/…

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

Simple identities with deep consequences: (1) ReLU(x) - ReLU(-x) = x for all x in R. (2) sum_{i=1}^j i*(-1)^{j-i} = floor((j+1)/2). Using these, it can be shown that that k-sparse parity function is exactly representable by a 2-layer ReLU network of width k+3 = O(k).