Manya Wadhwa (@manyawadhwa1) Twitter Tweets • TwiCopy

Yizhong Wang

5 months ago

Thrilled to announce that I will be joining UT Austin Computer Science at UT Austin as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

Thrilled to announce that I will be joining <a href="/UTAustin/">UT Austin</a> <a href="/UTCompSci/">Computer Science at UT Austin</a> as an assistant professor in fall 2026!

I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

thumb_up_off_alt620

chat_bubble_outline98

repeat48

shareShare

Kanishka Misra 🌊

@kanishkamisra

5 months ago

News🗞️ I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘 Excited to develop ideas about linguistic and conceptual generalization! Recruitment details soon

thumb_up_off_alt278

chat_bubble_outline47

repeat19

shareShare

Sebastian Joseph

@sebajoed

5 months ago

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

thumb_up_off_alt18

chat_bubble_outline1

repeat8

shareShare

Fangcong Yin

@fangcong_y10593

5 months ago

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

thumb_up_off_alt87

chat_bubble_outline5

repeat31

shareShare

Asher Zheng

@asher_zheng00

5 months ago

Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟 👉Introducing CoBRA🐍, a framework that assesses strategic language. Work with my amazing advisors Jessy Li and David Beaver! 🧵👇

thumb_up_off_alt20

chat_bubble_outline2

repeat8

shareShare

Chau Minh Pham

@chautmpham

5 months ago

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts? 🧟 You get what we call a Frankentext! 💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.

thumb_up_off_alt115

chat_bubble_outline4

repeat33

shareShare

Chaitanya Malaviya

@cmalaviya11

5 months ago

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

thumb_up_off_alt75

chat_bubble_outline1

repeat17

shareShare

Xi Ye

@xiye_nlp

5 months ago

🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? 📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: 🔍 Better head detection: we find a

thumb_up_off_alt66

chat_bubble_outline1

repeat17

shareShare

Fangcong Yin

@fangcong_y10593

5 months ago

Check out our new work on query-focused retrieval heads of LLMs! It is cool to see how interpretability insights can be used to improve zero-shot reasoning and re-ranking over long context.

thumb_up_off_alt12

chat_bubble_outline0

repeat5

shareShare

Leo Liu

@zeyuliu10

5 months ago

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

thumb_up_off_alt109

chat_bubble_outline3

repeat40

shareShare

Manya Wadhwa

@manyawadhwa1

4 months ago

Happy to share that EvalAgent has been accepted to #COLM2025 Conference on Language Modeling 🎉🇨🇦 We introduce a framework to identify implicit and diverse evaluation criteria for various open-ended tasks! 📜 arxiv.org/pdf/2504.15219

thumb_up_off_alt70

chat_bubble_outline1

repeat16

shareShare

Leqi Liu

@leqi_liu

4 months ago

What if you could understand and control an LLM by studying its *smaller* sibling? Our new paper proposes the Linear Representation Transferability Hypothesis: internal representations of different-sized models can be translated via a simple linear (affine) map.

thumb_up_off_alt159

chat_bubble_outline4

repeat15

shareShare

Anisha Gunjal

@anisha_gunjal

3 months ago

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer? Our work at Scale AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer?

Our work at <a href="/scale_AI/">Scale AI</a> introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵

thumb_up_off_alt199

chat_bubble_outline5

repeat34

shareShare

Jessy Li

@jessyjli

3 months ago

Thursday at #ACL2025: honored to be giving a keynote at the Linguistic Annotation Workshop (LAW) Time: 4pm CEST Location: Room 1.15-16 sigann.github.io/LAW-XIX-2025/i…

thumb_up_off_alt65

chat_bubble_outline1

repeat9

shareShare

Greg Durrett

@gregd_nlp

3 months ago

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please

thumb_up_off_alt755

chat_bubble_outline91

repeat45

shareShare

Jessy Li

@jessyjli

3 months ago

The Echoes in AI paper showed quite the opposite with also a story continuation setup. Additionally, we present evidence that both *syntactic* and *discourse* diversity measures show strong homogenization that lexical and cosine used in this paper do not capture.

thumb_up_off_alt38

chat_bubble_outline2

repeat14

shareShare

Amy Pavel

@amypavel

2 months ago

📣I've joined @BerkeleyEECS as an Assistant Professor! My lab will join me soon to continue our research in accessibility, HCI, and supporting communication! I'm so excited to make new connections at UC Berkeley and in the Bay Area more broadly, so please reach out to chat!

thumb_up_off_alt688

chat_bubble_outline49

repeat19

shareShare

Yuhan Liu

@yuhanliu_nlp

2 months ago

👀Have you asked LLM to provide a more detailed answer after inspecting its initial output? Users often provide such implicit feedback during interaction. ✨We study implicit user feedback found in LMSYS and WildChat. #EMNLP2025

thumb_up_off_alt75

chat_bubble_outline2

repeat21

shareShare

Orion Weller @ ICLR 2025

@orionweller

2 months ago

Instructions/reasoning are now everywhere in retrieval - we want embeddings to do it all! 🚀 But... is it even possible? 🤔 Turns out, it's not possible for single-vector models 😱 theoretically and empirically! To make it obvious we OSS a simple eval SoTA models flop on! 🧵

thumb_up_off_alt307

chat_bubble_outline11

repeat73

shareShare

Mina Huh

@mina1004h

2 months ago

What if a how-to video could become your personal task assistant -- that truly understands the task and adapts to your context? We present Vid2Coach, a system that transforms instructional videos into a wearable, camera-based assistant. 🎥✨🕶️ #UIST2025

thumb_up_off_alt107

chat_bubble_outline4

repeat13

shareShare