Richard Antonello (@neurorj) 's Twitter Profile
Richard Antonello

@neurorj

Postdoc in the Mesgarani Lab at Columbia University. Studying how the brain processes language by using LLMs. (Formerly @HuthLab at UT Austin)

ID: 1260656805669191680

calendar_today13-05-2020 19:43:07

219 Tweet

356 Followers

224 Following

Katie Kang (@katie_kang_) 's Twitter Profile Photo

LLMs excel at fitting finetuning data, but are they learning to reason or just parroting🦜? We found a way to probe a model's learning process to reveal *how* each example is learned. This lets us predict model generalization using only training data, amongst other insights: 🧵

LLMs excel at fitting finetuning data, but are they learning to reason or just parroting🦜?

We found a way to probe a model's learning process to reveal *how* each example is learned. This lets us predict model generalization using only training data, amongst other insights: 🧵
Marianne Arriola @ ICLR’25 (@mariannearr) 's Twitter Profile Photo

🚨Announcing our #ICLR2025 Oral! 🔥Diffusion LMs are on the rise for parallel text generation! But unlike autoregressive LMs, they struggle with quality, fixed-length constraints & lack of KV caching. 🚀Introducing Block Diffusion—combining autoregressive and diffusion models

Ruimin Gao (@ruimin_g) 's Twitter Profile Photo

Excited to introduce funROI: A Python package for functional ROI analyses of fMRI data! funroi.readthedocs.io/en/latest/ #fMRI #Neuroimaging #Python #OpenScience Work w Anna Ivanova 🧵👇

Excited to introduce funROI: A Python package for functional ROI analyses of fMRI data!

funroi.readthedocs.io/en/latest/

#fMRI #Neuroimaging #Python #OpenScience

Work w <a href="/neuranna/">Anna Ivanova</a> 

🧵👇
Karan Dalal (@karansdalal) 's Twitter Profile Photo

Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency. Every video below is produced directly by

Richard Antonello (@neurorj) 's Twitter Profile Photo

For those attending NAACL, today I'll be presenting recent work on how we can use language encoding models to identify functional specialization throughout cortex. Stop by my talk at 10:30 at the CMCL workshop!

For those attending NAACL, today I'll be presenting recent work on how we can use language encoding models to identify functional specialization throughout cortex. Stop by my talk at 10:30 at the CMCL workshop!
Yufan Zhuang (@yufan_zhuang) 's Twitter Profile Photo

🤯Your LLM just threw away 99.9 % of what it knows. Standard decoding samples one token at a time and discards the rest of the probability mass. Mixture of Inputs (MoI) rescues that lost information, feeding it back for more nuanced expressions. It is a brand new

🤯Your LLM just threw away 99.9 % of what it knows.

Standard decoding samples one token at a time and discards the rest of the probability mass. 

Mixture of Inputs (MoI) rescues that lost information, feeding it back for more nuanced expressions.

It is a brand new
Guy Gaziv (@ggaziv) 's Twitter Profile Photo

Can we precisely and noninvasively modulate deep brain activity just by riding the natural visual feed? 👁️🧠 In our new preprint, we use brain models to craft subtle image changes that steer deep neural populations in primate IT cortex. Just pixels. 📝arxiv.org/abs/2506.05633

Chandan Singh (@csinva) 's Twitter Profile Photo

New paper: Ask 35 simple questions about sentences in a story and use the answers to predict brain responses. Interpretable. Compact. Surprisingly high performance in both fMRI and ECoG. biorxiv.org/content/10.110…

New paper: Ask 35 simple questions about sentences in a story and use the answers to predict brain responses. Interpretable. Compact. Surprisingly high performance in both fMRI and ECoG. biorxiv.org/content/10.110…
Simone Scardapane (@s_scardapane) 's Twitter Profile Photo

*Harnessing the Universal Geometry of Embeddings* by Rishi Jha Jack Morris Vitaly Shmatikov With the proper set of losses, text embeddings from different models can be aligned with no paired data (what they call the "strong" Platonic hypothesis). arxiv.org/abs/2505.12540

*Harnessing the Universal Geometry of Embeddings*
by <a href="/rishi_d_jha/">Rishi Jha</a> <a href="/jxmnop/">Jack Morris</a> <a href="/shmatikov/">Vitaly Shmatikov</a>

With the proper set of losses, text embeddings from different models can be aligned with no paired data (what they call the "strong" Platonic hypothesis).

arxiv.org/abs/2505.12540
Rajvi Agravat (@rajviagravat) 's Twitter Profile Photo

If you're at #SNL2025 and curious about speech and music perception or its representation in developing brains, stop by my poster in Session C – #77! :)

Sam Norman-Haignere (@samnormanh) 's Twitter Profile Photo

Human auditory cortex integrates information in speech across absolute time (e.g., 200 ms), not phonemes, syllables, words, or any other time-varying speech structure: nature.com/articles/s4159…

Human auditory cortex integrates information in speech across absolute time (e.g., 200 ms), not phonemes, syllables, words, or any other time-varying speech structure: nature.com/articles/s4159…
Anna Ivanova (@neuranna) 's Twitter Profile Photo

As our lab started to build encoding 🧠 models, we were trying to figure out best practices in the field. So Taha Binhuraib 🦉 built a library to easily compare design choices & model features across datasets! We hope it will be useful to the community & plan to keep expanding it! 1/