Ananda Theertha Suresh (@th33rtha) 's Twitter Profile
Ananda Theertha Suresh

@th33rtha

Researcher in machine learning and information theory.

ID: 3059466984

linkhttp://theertha.info calendar_today03-03-2015 08:50:01

120 Tweet

900 Followers

136 Following

Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

Are you interested in theoretical aspects of sampling from language models? These tutorial slides should have good pointers to get started: theertha.info/papers/isit_20โ€ฆ p.s. The slides include my ๐‘ณ๐’‚๐’๐’ˆ๐’–๐’‚๐’ˆ๐’† ๐‘ด๐’๐’…๐’†๐’ ๐‘จ๐’๐’Š๐’ˆ๐’๐’Ž๐’†๐’๐’•: ๐‘ป๐’‰๐’†๐’๐’“๐’š & ๐‘ท๐’“๐’‚๐’„๐’•๐’Š๐’„๐’† talk

Are you interested in theoretical aspects of sampling from language models? 

These tutorial slides should have good pointers to get started:
theertha.info/papers/isit_20โ€ฆ

p.s. The slides include my ๐‘ณ๐’‚๐’๐’ˆ๐’–๐’‚๐’ˆ๐’† ๐‘ด๐’๐’…๐’†๐’ ๐‘จ๐’๐’Š๐’ˆ๐’๐’Ž๐’†๐’๐’•: ๐‘ป๐’‰๐’†๐’๐’“๐’š & ๐‘ท๐’“๐’‚๐’„๐’•๐’Š๐’„๐’† talk
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

I am giving a talk on theory & algorithms for ๐ฌ๐š๐Ÿ๐ž๐ญ๐ฒ ๐š๐ฅ๐ข๐ ๐ง๐ฆ๐ž๐ง๐ญ in this exciting symposium this afternoon!

Hossein Mobahi (@thegradient) 's Twitter Profile Photo

Workshop on Theory & Practice of Foundation Models (organized by Vahab Mirrokni and myself) will happen this week Google AI Mountain View. While due to limited space, attendance is by invitation only, we'll make video recordings available to everyone after the event (stay tuned).

Ananda Theertha Suresh (@th33rtha) 's Twitter Profile Photo

We are hiring! Our team at Google Research, NY is seeking a Research Scientist! Our recent research efforts include developing algorithms for improving inference efficiency and alignment of LLMs. If you are interested, then please consider applying! google.com/about/careers/โ€ฆ

Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

Very interesting paper by Ananda Theertha Suresh et al For categorical/Gaussian distributions, they derive the rate at which a sample is forgotten to be 1/k after k rounds of recursive training (hence ๐ฆ๐จ๐๐ž๐ฅ ๐œ๐จ๐ฅ๐ฅ๐š๐ฉ๐ฌ๐ž happens more slowly than intuitively expected)

Very interesting paper by <a href="/th33rtha/">Ananda Theertha Suresh</a> et al

For categorical/Gaussian distributions, they derive the rate at which a sample is forgotten to be 1/k after k rounds of recursive training (hence ๐ฆ๐จ๐๐ž๐ฅ ๐œ๐จ๐ฅ๐ฅ๐š๐ฉ๐ฌ๐ž happens more slowly than intuitively expected)
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

Excited to share ๐ˆ๐ง๐Ÿ๐€๐ฅ๐ข๐ ๐ง! Alignment optimization objective implicitly assumes ๐˜ด๐˜ข๐˜ฎ๐˜ฑ๐˜ญ๐˜ช๐˜ฏ๐˜จ from the resulting aligned model. But we are increasingly using different and sometimes sophisticated inference-time compute algorithms. How to resolve this discrepancy?๐Ÿงต

Excited to share ๐ˆ๐ง๐Ÿ๐€๐ฅ๐ข๐ ๐ง!

Alignment optimization objective implicitly assumes ๐˜ด๐˜ข๐˜ฎ๐˜ฑ๐˜ญ๐˜ช๐˜ฏ๐˜จ from the resulting aligned model. But we are increasingly using different and sometimes sophisticated inference-time compute algorithms. 

How to resolve this discrepancy?๐Ÿงต
Virginia Smith (@gingsmith) 's Twitter Profile Photo

There are a few updates to the review process at #ICML2025. These updates are all described on the ICML website, but we also released a blog post explaining our decisions (links & summary ๐Ÿงตbelow):

Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

๐›๐ž๐ฌ๐ญ-๐จ๐Ÿ-๐ง is a strong baseline for - improving agents - scaling inference-time compute - preference alignment - jailbreaking models How does ๐๐จ๐ง work? and why is it so strong? Find some answers in the paper we wrote over two Christmas breaks!๐Ÿงต

๐›๐ž๐ฌ๐ญ-๐จ๐Ÿ-๐ง is a strong baseline for 
- improving agents
- scaling inference-time compute
- preference alignment 
- jailbreaking models

How does ๐๐จ๐ง work? and why is it so strong?
Find some answers in the paper we wrote over two Christmas breaks!๐Ÿงต
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

We proposed CoDe -- a simple extension of blockwise controlled decoding to denoising diffusion models. CoDe offers a cheap, simple, & strong baseline for inference-time alignment of diffusion models!

Ziteng Sun (@sziteng) 's Twitter Profile Photo

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.

Can we align our model to better suit a given inference-time
Pranav Nair (@pranavn1008) 's Twitter Profile Photo

Announcing Matryoshka Quantization! A single Transformer can now be served at any integer precision!! In addition, our (sliced) int2 models outperform the baseline by 10%. Work co-led w/ PURANJAY DATTA, in colab w/ Jeff Dean, Prateek Jain & Aditya Kusupati. 1/7

Announcing Matryoshka Quantization! A single Transformer can now be served at any integer precision!! In addition, our (sliced) int2 models outperform the baseline by 10%. Work co-led w/ <a href="/puranjay1412/">PURANJAY DATTA</a>, in colab w/ <a href="/JeffDean/">Jeff Dean</a>, <a href="/jainprateek_/">Prateek Jain</a> &amp; <a href="/adityakusupati/">Aditya Kusupati</a>.

1/7
Arya Mazumdar (@mountainofmoon) 's Twitter Profile Photo

And the next EnCORE Institute workshop will be on **Theoretical Perspectives on LLMs** sites.google.com/ucsd.edu/encorโ€ฆ We have a great lineup of participants - and an incredible set of talks. Registration link will be active soon

And the next <a href="/EnCOREInstitut/">EnCORE Institute</a> workshop will be on **Theoretical Perspectives on LLMs** sites.google.com/ucsd.edu/encorโ€ฆ We have a great lineup of participants - and an incredible set of talks. Registration link will be active soon
Zico Kolter (@zicokolter) 's Twitter Profile Photo

Excited about this work with Asher Trockman Yash Savani (and others) on antidistillation sampling. It uses a nifty trick to efficiently generate samples that makes student models _worse_ when you train on samples. I spoke about it at Simons this past week. Links below.

Excited about this work with <a href="/ashertrockman/">Asher Trockman</a> <a href="/yashsavani_/">Yash Savani</a> (and others) on antidistillation sampling. It uses a nifty trick to efficiently generate samples that makes student models _worse_ when you train on samples. I spoke about it at Simons this past week. Links below.