Elvis Dohmatob (@dohmatobelvis) Twitter Tweets • TwiCopy

Elvis Dohmatob

@dohmatobelvis

+ Follow

Professor of CSC at @Concordia (CRC chair) & @Mila_Quebec. Visiting prof @AIatMeta. Previously @AIatMeta, @criteo, @inria. Interested in the principles of ML.

ID: 576412169

linkhttps://www.concordia.ca/faculty/elvis-dohmatob.html calendar_today10-05-2012 17:24:45

2,2K Tweet

3,3K Followers

530 Following

Elvis Dohmatob

@dohmatobelvis

9 months ago

Job Alert: Are you thinking of doing a PhD in ML, with a theoretical/algorithmic flavor? I'm hiring talented and passionate students to work with me on: ML theory; Neural scaling laws; Synthetic data (the good, the bad and the ugly); Explainable and trustworth AI (adversarial

thumb_up_off_alt60

chat_bubble_outline0

repeat22

shareShare

Elvis Dohmatob

@dohmatobelvis

8 months ago

We refused to cite the paper due to severe misconduct of the authors of that paper: plagiarism of our own prior work, predominantly AI-generated content (ya, the authors plugged our paper into an LLM and generated another paper), IRB violations, etc. Revealed during a long

thumb_up_off_alt379

chat_bubble_outline18

repeat27

shareShare

ICLR 2025

@iclr_conf

8 months ago

Rylan Schaeffer Elvis Dohmatob

<a href="/RylanSchaeffer/">Rylan Schaeffer</a> <a href="/dohmatobelvis/">Elvis Dohmatob</a>

thumb_up_off_alt67

chat_bubble_outline2

repeat7

shareShare

Elvis Dohmatob

@dohmatobelvis

8 months ago

1) New preprint! arxiv.org/abs/2504.10754 Most of ML theory (i.e fine-grained analysis to explain / reveal interesting phenomenon) can be automated, e.g via free probability theory. In our recent work with Arjun Subramonian, we provide a small lightweight tool to do just this

thumb_up_off_alt52

chat_bubble_outline1

repeat5

shareShare

Elvis Dohmatob

@dohmatobelvis

8 months ago

I'll also be giving an invited talk at the associative memory workshop tomorrow Saturday at 9:30 nfam.vizhub.ai/schedule/, on provable scaling laws for a toy LLM mirrored around associative memories

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Elvis Dohmatob

@dohmatobelvis

8 months ago

Our work on beating scaling laws via deliberate practice, a modified synthetic data-generation scheme wherein harder / more entropic examples are favored, has been accepted at ICML Conference #ICML2025.

thumb_up_off_alt26

chat_bubble_outline4

repeat2

shareShare

Elvis Dohmatob

@dohmatobelvis

8 months ago

[Bored] I came across YT video where someone said the probability that a needle of length 1 unit, dropped on an infinitely wide piece of paper with parallel lines uniformly spaced 1 unit apart is 2/pi. Here is my quick and dirty proof. Unfortunately, it makes use of calculus :/

thumb_up_off_alt19

chat_bubble_outline2

repeat4

shareShare

Yizhou Liu

@yizhouliu0

7 months ago

Superposition means that models represent more features than dimensions they have, which is true for LLMs since there are too many things to represent in language. We find that superposition leads to a power-law loss with width, leading to the observed neural scaling law. (1/n)

thumb_up_off_alt543

chat_bubble_outline4

repeat73

shareShare

Elvis Dohmatob

@dohmatobelvis

7 months ago

Stein's Lemma: E[f'(x)] = E[f(x)x] for x ~ N(0,1). Corollary (Gaussian integration by parts): E[g'(x)h(x)] = E[g(x)(xh(x) - h'(x))]. Proof. Take f(x):=g(x)*h(x) and observe that f'(x)=g(x)h'(x)+g'(x)h(x). We deduce: E[g(x)h'(x)+g'(x)h(x)]=E[g(x)h(x)x] and the result follows. QED

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

Simon Willison

@simonw

6 months ago

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

thumb_up_off_alt2,2K

chat_bubble_outline70

repeat474

shareShare

Eugene Ndiaye

@eugene_ndiaye

6 months ago

#MlssSenegal2025 Full Schedule 🤓😎

thumb_up_off_alt23

chat_bubble_outline0

repeat12

shareShare

Eugene Ndiaye

@eugene_ndiaye

6 months ago

Elvis Dohmatob explaining the math behind scaling laws #MlssSenegal2025

<a href="/dohmatobelvis/">Elvis Dohmatob</a> explaining the math behind scaling laws #MlssSenegal2025

thumb_up_off_alt16

chat_bubble_outline1

repeat6

shareShare

Mathieu Blondel

@mblondel_ml

6 months ago

Back from MLSS Senegal 🇸🇳, where I had the honor of giving lectures on differentiable programming. Really grateful for all the amazing people I got to meet 🙏 My slides are here github.com/diffprog/slide…

thumb_up_off_alt71

chat_bubble_outline3

repeat20

shareShare

Elvis Dohmatob

@dohmatobelvis

5 months ago

Mathematical foundations of neural scaling laws: Here are slides for lectures I gave at the the recent MLSS 2025 summer school in Dakar. Every LLM theorist and practitioner should know these things! The summer school: mlss-senegal.github.io My slides: drive.google.com/drive/folders/…

thumb_up_off_alt37

chat_bubble_outline0

repeat4

shareShare

Elvis Dohmatob

@dohmatobelvis

5 months ago

Simple identities with deep consequences: (1) ReLU(x) - ReLU(-x) = x for all x in R. (2) sum_{i=1}^j i*(-1)^{j-i} = floor((j+1)/2). Using these, it can be shown that that k-sparse parity function is exactly representable by a 2-layer ReLU network of width k+3 = O(k).

thumb_up_off_alt18

chat_bubble_outline0

repeat2

shareShare