Manya Wadhwa (@manyawadhwa1) 's Twitter Profile
Manya Wadhwa

@manyawadhwa1

PhD Student @UTCompSci | #NLProc | she/her

ID: 1049023881296658434

linkhttps://manyawadhwa.github.io/ calendar_today07-10-2018 19:49:18

189 Tweet

363 Followers

903 Following

Yizhong Wang (@yizhongwyz) 's Twitter Profile Photo

Thrilled to announce that I will be joining UT Austin Computer Science at UT Austin as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

Thrilled to announce that I will be joining <a href="/UTAustin/">UT Austin</a> <a href="/UTCompSci/">Computer Science at UT Austin</a> as an assistant professor in fall 2026! 

I will continue working on language models, data challenges, learning paradigms, &amp; AI for innovation. Looking forward to teaming up with new students &amp; colleagues! 🤠🤘
Kanishka Misra 🌊 (@kanishkamisra) 's Twitter Profile Photo

News🗞️ I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘 Excited to develop ideas about linguistic and conceptual generalization! Recruitment details soon

News🗞️

I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘

Excited to develop ideas about linguistic and conceptual generalization! Recruitment details soon
Sebastian Joseph (@sebajoed) 's Twitter Profile Photo

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

How good are LLMs at 🔭 scientific computing and visualization 🔭?

AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.

SOTA models like Gemini 2.5 Pro &amp; Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
Fangcong Yin (@fangcong_y10593) 's Twitter Profile Photo

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

Solving complex problems with CoT requires combining different skills.

We can do this by:
🧩Modify the CoT data format to be “composable” with other skills
🔥Train models on each skill
📌Combine those models

Lead to better 0-shot reasoning on tasks involving skill composition!
Asher Zheng (@asher_zheng00) 's Twitter Profile Photo

Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟 👉Introducing CoBRA🐍, a framework that assesses strategic language. Work with my amazing advisors Jessy Li and David Beaver! 🧵👇

Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟

👉Introducing CoBRA🐍, a framework that assesses strategic language.

Work with my amazing advisors <a href="/jessyjli/">Jessy Li</a> and <a href="/David_Beaver/">David Beaver</a>!
🧵👇
Chau Minh Pham (@chautmpham) 's Twitter Profile Photo

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts? 🧟 You get what we call a Frankentext! 💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?

🧟 You get what we call a Frankentext!

💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.
Chaitanya Malaviya (@cmalaviya11) 's Twitter Profile Photo

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses?

Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓
Xi Ye (@xiye_nlp) 's Twitter Profile Photo

🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? 📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: 🔍 Better head detection: we find a

🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval?
📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval

Main contributions:
 🔍 Better head detection: we find a
Fangcong Yin (@fangcong_y10593) 's Twitter Profile Photo

Check out our new work on query-focused retrieval heads of LLMs! It is cool to see how interpretability insights can be used to improve zero-shot reasoning and re-ranking over long context.

Leo Liu (@zeyuliu10) 's Twitter Profile Photo

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

LLMs trained to memorize new facts can’t use those facts well.🤔

We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡

Our approach, PropMEND, extends MEND with a new objective for propagation.
Manya Wadhwa (@manyawadhwa1) 's Twitter Profile Photo

Happy to share that EvalAgent has been accepted to #COLM2025 Conference on Language Modeling 🎉🇨🇦 We introduce a framework to identify implicit and diverse evaluation criteria for various open-ended tasks! 📜 arxiv.org/pdf/2504.15219

Leqi Liu (@leqi_liu) 's Twitter Profile Photo

What if you could understand and control an LLM by studying its *smaller* sibling? Our new paper proposes the Linear Representation Transferability Hypothesis: internal representations of different-sized models can be translated via a simple linear (affine) map.

Anisha Gunjal (@anisha_gunjal) 's Twitter Profile Photo

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer? Our work at Scale AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer?

Our work at <a href="/scale_AI/">Scale AI</a> introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵
Jessy Li (@jessyjli) 's Twitter Profile Photo

Thursday at #ACL2025: honored to be giving a keynote at the Linguistic Annotation Workshop (LAW) Time: 4pm CEST Location: Room 1.15-16 sigann.github.io/LAW-XIX-2025/i…

Thursday at #ACL2025: honored to be giving a keynote at the Linguistic Annotation Workshop (LAW) 
Time: 4pm CEST
Location: Room 1.15-16

sigann.github.io/LAW-XIX-2025/i…
Greg Durrett (@gregd_nlp) 's Twitter Profile Photo

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall!

I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more!

I’m also looking to build connections in the NYC area more broadly. Please
Jessy Li (@jessyjli) 's Twitter Profile Photo

The Echoes in AI paper showed quite the opposite with also a story continuation setup. Additionally, we present evidence that both *syntactic* and *discourse* diversity measures show strong homogenization that lexical and cosine used in this paper do not capture.

The Echoes in AI paper showed quite the opposite with also a story continuation setup.
Additionally, we present evidence that both *syntactic* and *discourse* diversity measures show strong homogenization that lexical and cosine used in this paper do not capture.
Amy Pavel (@amypavel) 's Twitter Profile Photo

📣I've joined @BerkeleyEECS as an Assistant Professor! My lab will join me soon to continue our research in accessibility, HCI, and supporting communication! I'm so excited to make new connections at UC Berkeley and in the Bay Area more broadly, so please reach out to chat!

📣I've joined @BerkeleyEECS as an Assistant Professor! My lab will join me soon to continue our research in accessibility, HCI, and supporting communication!

I'm so excited to make new connections at <a href="/UCBerkeley/">UC Berkeley</a> and in the Bay Area more broadly, so please reach out to chat!
Yuhan Liu (@yuhanliu_nlp) 's Twitter Profile Photo

👀Have you asked LLM to provide a more detailed answer after inspecting its initial output? Users often provide such implicit feedback during interaction. ✨We study implicit user feedback found in LMSYS and WildChat. #EMNLP2025

👀Have you asked LLM to provide a more detailed answer after inspecting its initial output? Users often provide such implicit feedback during interaction.

✨We study implicit user feedback found in LMSYS and WildChat. #EMNLP2025
Orion Weller @ ICLR 2025 (@orionweller) 's Twitter Profile Photo

Instructions/reasoning are now everywhere in retrieval - we want embeddings to do it all! 🚀 But... is it even possible? 🤔 Turns out, it's not possible for single-vector models 😱 theoretically and empirically! To make it obvious we OSS a simple eval SoTA models flop on! 🧵

Instructions/reasoning are now everywhere in retrieval - we want embeddings to do it all! 🚀

But... is it even possible? 🤔

Turns out, it's not possible for single-vector models 😱 theoretically and empirically! To make it obvious we OSS a simple eval SoTA models flop on!

🧵
Mina Huh (@mina1004h) 's Twitter Profile Photo

What if a how-to video could become your personal task assistant -- that truly understands the task and adapts to your context? We present Vid2Coach, a system that transforms instructional videos into a wearable, camera-based assistant. 🎥✨🕶️ #UIST2025

What if a how-to video could become your personal task assistant -- that truly understands the task and adapts to your context? 
We present Vid2Coach, a system that transforms instructional videos into a wearable, camera-based assistant. 🎥✨🕶️ #UIST2025