Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile
Ahmad Beirami @ ICLR 2025

@abeirami

Generative AI post-training @GoogleDeepMind/@GoogleResearch | ex-{@AIatMeta, @EA, @MIT, @Harvard, @DukeU} | @GeorgiaTech PhD | زن زندگی آزادی | opinions my own

ID: 1073974163704958976

linkhttp://www.mit.edu/~beirami/ calendar_today15-12-2018 16:12:48

2,2K Tweet

6,6K Followers

2,2K Following

Aaron Roth (@aaroth) 's Twitter Profile Photo

The United States has had a tremendous advantage in science and technology because it has been the consensus gathering point: the best students worldwide want to study and work in the US because that is where the best students are studying and working. 1/

Amrit Singh Bedi (@amritsinghbedi3) 's Twitter Profile Photo

Can decades old ideas from #psychology help fix critical issues in modern LLM alignment? 🤔 We're tapping into #BoundedRationality & 'satisficing principles' to build an alternate way to align LLMs. Our new #ICML2025 paper 👇 🧵 arxiv.org/pdf/2505.23729

Can decades old ideas from #psychology help fix critical issues in modern LLM alignment? 🤔

We're tapping into #BoundedRationality & 'satisficing principles' to build an alternate way to align LLMs.

Our new #ICML2025 paper 👇 🧵

arxiv.org/pdf/2505.23729
Aryeh Kontorovich (@aryehazan) 's Twitter Profile Photo

my off-the-cuff take, after playing around with Gemini 2.5 a bit, is that solving Olympiad-style math problems gives it the insight and serendipity to also make significant strides in (and occasionally solve) research-level math

Diyi Yang (@diyi_yang) 's Twitter Profile Photo

🤝 Humans + AI = Better together? Our #ACL2025 tutorial offers an interdisciplinary overview of human-AI collaboration to explore its goals, evaluation, and societal impacts 🤖

Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

A language model is a probability mass function* p(y|x) where x is any sequence of tokens that belong to alphabet A, and y can be any sequence of tokens that belong to alphabet A. * I.e. sum_{y \in A*} p(y|x) = 1 for all x \in A*.

Tiago Pimentel (@tpimentelms) 's Twitter Profile Photo

A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our #acl2025nlp paper proposes an observational method to estimate this causal effect! Longer thread soon!

A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our #acl2025nlp paper proposes an observational method to estimate this causal effect! Longer thread soon!