William Merrill (@lambdaviking) Twitter Tweets • TwiCopy

Michael Hu

8 months ago

Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇

thumb_up_off_alt706

chat_bubble_outline18

repeat107

shareShare

Computational Capability and Efficiency of Neural Networks: A Repository of Papers I compiled a list of theoretical papers related to the computational capabilities of Transformers, recurrent networks, feedforward networks, and graph neural networks. Link:

thumb_up_off_alt156

chat_bubble_outline7

repeat35

shareShare

typedfemale

@typedfemale

5 months ago

great title

thumb_up_off_alt67

chat_bubble_outline5

repeat3

shareShare

Charlie London

@charlielondon02

5 months ago

New preprint with my supervisor, Varun! We show that padding the input of a Transformer with blank "pause" tokens strictly increases expressivity (in the finite-precision case), enabling it to compute everything in AC0.

thumb_up_off_alt71

chat_bubble_outline3

repeat10

shareShare

Michael Hu

@michahu8

5 months ago

Accepted to ACL! See you in Vienna 🫡 code: github.com/michahu/pre-pr… arxiv: arxiv.org/abs/2502.19249

thumb_up_off_alt75

chat_bubble_outline2

repeat10

shareShare

William Merrill

@lambdaviking

5 months ago

Cool benchmark idea! and down to showmatch vs. claude in Age of Empires⚔️ Ofir Press

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

William Merrill

@lambdaviking

5 months ago

I'll also be at ACL and excited to talk about Michael Hu 's project!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Tal Linzen

@tallinzen

5 months ago

International students, and Chinese students in particular, are essential to the AI research ecosystem in the US. You can't say you support AI research in this country and then threaten to revoke Chinese students' visas.

thumb_up_off_alt180

chat_bubble_outline10

repeat17

shareShare

Byung-Doh Oh

@byungdoh

5 months ago

Have reading time corpora been leaked into LM pre-training corpora? Should you be cautious about using pre-trained LM surprisal as a consequence? We identify the longest overlapping token sequences and conclude the leakage is mostly not severe. In Findings of #ACL2025 #ACL2025NLP

thumb_up_off_alt22

chat_bubble_outline1

repeat6

shareShare

Tal Linzen

@tallinzen

5 months ago

Slides from my talk at Apple (thanks for hosting!) on our recent work on formal languages for LLM pretraining and evaluation: drive.google.com/file/d/1EtsyQ-…

thumb_up_off_alt96

chat_bubble_outline4

repeat13

shareShare

Jackson Petty

@jowenpetty

5 months ago

How well can LLMs understand tasks with complex sets of instructions? We investigate through the lens of RELIC: REcognizing (formal) Languages In-Context, finding a significant overhang between what LLMs are able to do theoretically and how well they put this into practice.

thumb_up_off_alt99

chat_bubble_outline3

repeat21

shareShare

William Merrill

@lambdaviking

5 months ago

A fun project with really thorough analysis of how LLMs try and often fail to implement parsing algorithms. Bonus: find out what this all has to do with the Kalamang language from New Guinea

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

William Merrill

@lambdaviking

5 months ago

I'll be defending my dissertation at NYU next Monday, June 16 at 4pm ET! I've definitely missed inviting some people who might be interested, so please email me if you'd like to attend (NYC or Zoom)

thumb_up_off_alt135

chat_bubble_outline1

repeat10

shareShare

Tal Linzen

@tallinzen

5 months ago

So, on the topic of the Apple puzzle reasoning paper: we got pretty similar results in our recent paper on recognizing context-free languages as an LLM eval, a task that also requires the model to follow an algorithm (which I think is what LLM folks mean by "reasoning").

thumb_up_off_alt304

chat_bubble_outline7

repeat27

shareShare

David Chiang

@davidweichiang

4 months ago

New on arXiv: Knee-Deep in C-RASP, by Andy J Yang, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

New on arXiv: Knee-Deep in C-RASP, by <a href="/pentagonalize/">Andy J Yang</a>, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

thumb_up_off_alt36

chat_bubble_outline1

repeat9

shareShare

William Merrill

@lambdaviking

4 months ago

New episode of the C-RASP saga just dropped!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

TTIC

@ttic_connect

4 months ago

We’re proud to announce three new tenure-track assistant professors joining TTIC in Fall 2026: Yossi Gandelsman (Yossi Gandelsman), Will Merrill (William Merrill), and Nick Tomlin (Nicholas Tomlin). Meet them here: buff.ly/JH1DFtT

We’re proud to announce three new tenure-track assistant professors joining TTIC in Fall 2026: Yossi Gandelsman (<a href="/YGandelsman/">Yossi Gandelsman</a>), Will Merrill (<a href="/lambdaviking/">William Merrill</a>), and Nick Tomlin (<a href="/NickATomlin/">Nicholas Tomlin</a>). Meet them here: buff.ly/JH1DFtT

thumb_up_off_alt135

chat_bubble_outline3

repeat9

shareShare

William Merrill

@lambdaviking

4 months ago

Stoked to be joining TTIC along with this great cohort!

thumb_up_off_alt73

chat_bubble_outline5

repeat1

shareShare

William Merrill

Michael Hu

Kimon Fountoulakis

typedfemale

Charlie London

Michael Hu

William Merrill

William Merrill

Tal Linzen

Byung-Doh Oh

Tal Linzen

Jackson Petty

William Merrill

William Merrill

Tal Linzen

David Chiang

William Merrill

TTIC

William Merrill