Atula Tejaswi (@atu_tej) Twitter Tweets • TwiCopy

Qing Yao

7 months ago

LMs learn argument-based preferences for dative constructions (preferring recipient first when it’s shorter), being quite consistent with humans. Is this from just memorizing the preferences in their training data? New paper w/ Kanishka Misra 🌊, Leonie Weissweiler, Kyle Mahowald

thumb_up_off_alt21

chat_bubble_outline1

repeat5

shareShare

David Heineman

@heinemandavidj

3 months ago

Evaluating language models is tricky, how do we know if our results are real, or due to random chance? We find an answer with two simple metrics: signal, a benchmark’s ability to separate models, and noise, a benchmark’s random variability between training steps 🧵

thumb_up_off_alt209

chat_bubble_outline3

repeat38

shareShare

Alex Dimakis

@alexgdimakis

3 months ago

We are hiring in Bespoke Labs for a new role: Member of Technical Staff: AI Data and RL Environments. Work on data curation strategies with the team that created OpenThoughts. Invent novel data recipes, strategies of curating datasets, environments, tasks and verifiers. (My

thumb_up_off_alt143

chat_bubble_outline6

repeat15

shareShare

Shiwei Liu

@shiwei_liu66

2 months ago

I have finally started my PI position at ELLIS Institute Tübingen this week! ELLIS Institute Tübingen Interns are actively hired. Applying if you are interested! institute-tue.ellis.eu/en/jobs/intern…

thumb_up_off_alt169

chat_bubble_outline13

repeat9

shareShare

Owain Evans

@owainevans_uk

2 months ago

Check out our paper on internal reasoning in LLMs...

thumb_up_off_alt127

chat_bubble_outline0

repeat10

shareShare

Wei Xu

@cocoweixu

2 months ago

Paper accepted to #NeurIPS2025! 🎉 "Probabilistic Reasoning with LLMs" by my PhD student Jonathan Zheng → arxiv.org/abs/2503.09674 Super exciting! It moves beyond rigid math/logic reasoning into probabilistic reasoning, reflecting how people tackle many real-world problems.

Paper accepted to #NeurIPS2025! 🎉

"Probabilistic Reasoning with LLMs" by my PhD student <a href="/JonathanQZheng/">Jonathan Zheng</a>

→ arxiv.org/abs/2503.09674

Super exciting! It moves beyond rigid math/logic reasoning into probabilistic reasoning, reflecting how people tackle many real-world problems.

thumb_up_off_alt738

chat_bubble_outline11

repeat100

shareShare

Alan Ritter

@alan_ritter

2 months ago

What if LLMs could predict their own accuracy on a new task before running a single experiment? We introduce PRECOG, built from real papers, to study description→performance forecasting. On both static and streaming tasks, GPT-5 beats human NLP researchers and simple baselines.

thumb_up_off_alt41

chat_bubble_outline2

repeat7

shareShare

Kanishka Misra 🌊

@kanishkamisra

a month ago

The compling group at UT Austin (sites.utexas.edu/compling/) is looking for PhD students! Come join me, Kyle Mahowald, and Jessy Li as we tackle interesting research questions at the intersection of ling, cogsci, and ai! Some topics I am particularly interested in:

The compling group at UT Austin (sites.utexas.edu/compling/) is looking for PhD students!

Come join me, <a href="/kmahowald/">Kyle Mahowald</a>, and <a href="/jessyjli/">Jessy Li</a> as we tackle interesting research questions at the intersection of ling, cogsci, and ai!

Some topics I am particularly interested in:

thumb_up_off_alt115

chat_bubble_outline1

repeat32

shareShare

Litu Rout

@litu_rout_

a month ago

Continuous diffusion had a good run—now it’s time for Discrete diffusion! Introducing Anchored Posterior Sampling (APS) APS outperforms discrete and continuous baselines in terms of performance & scaling on inverse problems, stylization, and text-guided editing.

thumb_up_off_alt428

chat_bubble_outline1

repeat70

shareShare

Atula Tejaswi

@atu_tej

a month ago

Come check out our poster at #COLM2025 in Montreal! Poster Session 2, Tue Oct 7th, 4:30-6:30 PM.

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Anirudh Khatry

@anirudhkhatry

a month ago

CRUST-bench will be presented as a #spotlight paper at #COLM2025 this Thursday (10/9) in session 5! Come check out our poster after the talk to know more about challenges LLMs face in the C to Rust transpilation task.

thumb_up_off_alt9

chat_bubble_outline0

repeat4

shareShare

Jessy Li

@jessyjli

a month ago

🚨 Does your LLM really understand code -- or is it just really good at remembering it? We built **PLSemanticsBench**, to find out. The results: a wild mix. ✅The Brilliant: Top reasoning models can execute complex, fuzzer-generated programs -- even with 5+ levels of nested

thumb_up_off_alt88

chat_bubble_outline3

repeat14

shareShare

Aayush Karan

@aakaran31

a month ago

We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.

thumb_up_off_alt1,1K

chat_bubble_outline48

repeat152

shareShare

Shiwei Liu

@shiwei_liu66

22 days ago

Super excited to have CPAL 2026 at Tübingen. At least two reasons to submit to CPAL: (1) professional reviewers in your fields, (2) come to see how this beautiful old city is being reborn in the age of AI.

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Shiwei Liu

@shiwei_liu66

16 days ago

🚀 We are hiring! Fully funded PhD position @ MPI-IS / ELLIS Institute Tübingen. Focusing on efficient algorithms and systems for machine learning — bridging theory and infrastructure to make large-scale AI faster, more stable, and sustainable. Experience with ML systems or

thumb_up_off_alt297

chat_bubble_outline7

repeat69

shareShare

Alex Dimakis

@alexgdimakis

6 days ago

Just announced: Terminal-Bench 2.0 launching Tommorow. 89 new realistic tasks, more than 300 hours of manual reviewing. Congratulations to the terminal-bench team !

thumb_up_off_alt72

chat_bubble_outline3

repeat11

shareShare

Ravid Shwartz Ziv

@ziv_ravid

5 days ago

🚨New paper! "Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training" We found something surprising about how LLMs get better at math: the critical layers for mathematical reasoning are forged during pre-training and stay

thumb_up_off_alt151

chat_bubble_outline2

repeat17

shareShare

Alex Shaw

@alexgshaw

5 days ago

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

thumb_up_off_alt321

chat_bubble_outline21

repeat67

shareShare

Jessy Li

@jessyjli

5 days ago

Incredibly honored to serve as #EMNLP 2026 Program Chair along with Sunipa Dev and Hung-yi Lee (李宏毅), and General Chair Andre Martins. Looking forward to Budapest!! (With thanks to Li Chuyuan who took this photo in Suzhou!)

Incredibly honored to serve as #EMNLP 2026 Program Chair along with <a href="/sunipa_dev/">Sunipa Dev</a> and <a href="/HungyiLee2/">Hung-yi Lee (李宏毅)</a>, and General Chair <a href="/andre_t_martins/">Andre Martins</a>. Looking forward to Budapest!!

(With thanks to <a href="/ChuyuanLi/">Li Chuyuan</a> who took this photo in Suzhou!)

thumb_up_off_alt124

chat_bubble_outline10

repeat11

shareShare

Alex Dimakis

@alexgdimakis

2 days ago

UT Austin is doubling its supercomputing cluster to more than 1000 GPUs. This cluster has been a key for open source AI. Datacomp , DCLM, OpenThoughts and many other open source projects by researchers in Austin and many other universities and labs around the world critically

thumb_up_off_alt123

chat_bubble_outline2

repeat11

shareShare