Atula Tejaswi (@atu_tej) 's Twitter Profile
Atula Tejaswi

@atu_tej

MS CS ➡️ Incoming CS PhD @UTCompSci

ID: 785849216971239424

linkhttp://atutej.github.io calendar_today11-10-2016 14:27:08

136 Tweet

176 Followers

376 Following

Qing Yao (@qyao23) 's Twitter Profile Photo

LMs learn argument-based preferences for dative constructions (preferring recipient first when it’s shorter), being quite consistent with humans. Is this from just memorizing the preferences in their training data? New paper w/ Kanishka Misra 🌊, Leonie Weissweiler, Kyle Mahowald

LMs learn argument-based preferences for dative constructions (preferring recipient first when it’s shorter), being quite consistent with humans. Is this from just memorizing the preferences in their training data? New paper w/ <a href="/kanishkamisra/">Kanishka Misra 🌊</a>, <a href="/LAWeissweiler/">Leonie Weissweiler</a>, <a href="/kmahowald/">Kyle Mahowald</a>
David Heineman (@heinemandavidj) 's Twitter Profile Photo

Evaluating language models is tricky, how do we know if our results are real, or due to random chance? We find an answer with two simple metrics: signal, a benchmark’s ability to separate models, and noise, a benchmark’s random variability between training steps 🧵

Evaluating language models is tricky, how do we know if our results are real, or due to random chance?

We find an answer with two simple metrics: signal, a benchmark’s ability to separate models, and noise, a benchmark’s random variability between training steps 🧵
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

We are hiring in Bespoke Labs for a new role: Member of Technical Staff: AI Data and RL Environments. Work on data curation strategies with the team that created OpenThoughts. Invent novel data recipes, strategies of curating datasets, environments, tasks and verifiers. (My

Shiwei Liu (@shiwei_liu66) 's Twitter Profile Photo

I have finally started my PI position at ELLIS Institute Tübingen this week! ELLIS Institute Tübingen Interns are actively hired. Applying if you are interested! institute-tue.ellis.eu/en/jobs/intern…

Wei Xu (@cocoweixu) 's Twitter Profile Photo

Paper accepted to #NeurIPS2025! 🎉 "Probabilistic Reasoning with LLMs" by my PhD student Jonathan Zheng → arxiv.org/abs/2503.09674 Super exciting! It moves beyond rigid math/logic reasoning into probabilistic reasoning, reflecting how people tackle many real-world problems.

Paper accepted to #NeurIPS2025! 🎉

"Probabilistic Reasoning with LLMs" by my PhD student <a href="/JonathanQZheng/">Jonathan Zheng</a> 

→ arxiv.org/abs/2503.09674

Super exciting! It moves beyond rigid math/logic reasoning into probabilistic reasoning, reflecting how people tackle many real-world problems.
Alan Ritter (@alan_ritter) 's Twitter Profile Photo

What if LLMs could predict their own accuracy on a new task before running a single experiment? We introduce PRECOG, built from real papers, to study description→performance forecasting. On both static and streaming tasks, GPT-5 beats human NLP researchers and simple baselines.

Kanishka Misra 🌊 (@kanishkamisra) 's Twitter Profile Photo

The compling group at UT Austin (sites.utexas.edu/compling/) is looking for PhD students! Come join me, Kyle Mahowald, and Jessy Li as we tackle interesting research questions at the intersection of ling, cogsci, and ai! Some topics I am particularly interested in:

The compling group at UT Austin (sites.utexas.edu/compling/) is looking for PhD students! 

Come join me, <a href="/kmahowald/">Kyle Mahowald</a>, and <a href="/jessyjli/">Jessy Li</a> as we tackle interesting research questions at the intersection of ling, cogsci, and ai!

Some topics I am particularly interested in:
Litu Rout (@litu_rout_) 's Twitter Profile Photo

Continuous diffusion had a good run—now it’s time for Discrete diffusion! Introducing Anchored Posterior Sampling (APS) APS outperforms discrete and continuous baselines in terms of performance & scaling on inverse problems, stylization, and text-guided editing.

Anirudh Khatry (@anirudhkhatry) 's Twitter Profile Photo

CRUST-bench will be presented as a #spotlight paper at #COLM2025 this Thursday (10/9) in session 5! Come check out our poster after the talk to know more about challenges LLMs face in the C to Rust transpilation task.

Jessy Li (@jessyjli) 's Twitter Profile Photo

🚨 Does your LLM really understand code -- or is it just really good at remembering it? We built **PLSemanticsBench**, to find out. The results: a wild mix. ✅The Brilliant: Top reasoning models can execute complex, fuzzer-generated programs -- even with 5+ levels of nested

🚨 Does your LLM really understand code -- or is it just really good at remembering it?
We built **PLSemanticsBench**, to find out.
The results: a wild mix.

✅The Brilliant:
Top reasoning models can execute complex, fuzzer-generated programs -- even with 5+ levels of nested
Aayush Karan (@aakaran31) 's Twitter Profile Photo

We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.

Shiwei Liu (@shiwei_liu66) 's Twitter Profile Photo

Super excited to have CPAL 2026 at Tübingen. At least two reasons to submit to CPAL: (1) professional reviewers in your fields, (2) come to see how this beautiful old city is being reborn in the age of AI.

Shiwei Liu (@shiwei_liu66) 's Twitter Profile Photo

🚀 We are hiring! Fully funded PhD position @ MPI-IS / ELLIS Institute Tübingen. Focusing on efficient algorithms and systems for machine learning — bridging theory and infrastructure to make large-scale AI faster, more stable, and sustainable. Experience with ML systems or

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Just announced: Terminal-Bench 2.0 launching Tommorow. 89 new realistic tasks, more than 300 hours of manual reviewing. Congratulations to the terminal-bench team !

Just announced: Terminal-Bench 2.0 launching Tommorow.  89 new realistic tasks, more than 300 hours of manual reviewing. Congratulations to the terminal-bench team !
Ravid Shwartz Ziv (@ziv_ravid) 's Twitter Profile Photo

🚨New paper! "Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training" We found something surprising about how LLMs get better at math: the critical layers for mathematical reasoning are forged during pre-training and stay

🚨New paper! 
"Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training"

We found something surprising about how LLMs get better at math: the critical layers for mathematical reasoning are forged during pre-training and stay
Alex Shaw (@alexgshaw) 's Twitter Profile Photo

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

Today, we’re announcing the next chapter of Terminal-Bench with two releases:

1. Harbor, a new package for running sandboxed agent rollouts at scale
2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
Jessy Li (@jessyjli) 's Twitter Profile Photo

Incredibly honored to serve as #EMNLP 2026 Program Chair along with Sunipa Dev and Hung-yi Lee (李宏毅), and General Chair Andre Martins. Looking forward to Budapest!! (With thanks to Li Chuyuan who took this photo in Suzhou!)

Incredibly honored to serve as #EMNLP 2026 Program Chair along with <a href="/sunipa_dev/">Sunipa Dev</a> and <a href="/HungyiLee2/">Hung-yi Lee (李宏毅)</a>, and General Chair <a href="/andre_t_martins/">Andre Martins</a>. Looking forward to Budapest!! 

(With thanks to <a href="/ChuyuanLi/">Li Chuyuan</a> who took this photo in Suzhou!)
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

UT Austin is doubling its supercomputing cluster to more than 1000 GPUs. This cluster has been a key for open source AI. Datacomp , DCLM, OpenThoughts and many other open source projects by researchers in Austin and many other universities and labs around the world critically