Yating Wu (@yatingwu96) 's Twitter Profile
Yating Wu

@yatingwu96

ECE Ph.D student @ UT Austin, advised by @jessyjli and @AlexGDimakis | テキサス大学に在籍する博士生

ID: 4686178212

linkhttp://lingchensanwen.github.io calendar_today01-01-2016 01:20:42

89 Tweet

228 Followers

275 Following

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Excited to launch the first model from our startup: Bespoke Labs. Bespoke-Minicheck-7B is a grounded factuality checker: super lightweight and fast. Outperforms all big foundation models including Claude 3.5 Sonnet, Mistral-Large m2 and GPT 4o and its only 7B. Also, I want to

Kanishka Misra 🌊 (@kanishkamisra) 's Twitter Profile Photo

🧐🔡🤖 Can LMs/NNs inform CogSci? This question has been (re)visited by many people across decades. Najoung Kim 🫠 and I contribute to this debate by using NN-based LMs to generate novel experimental hypotheses which can then be tested with humans!

🧐🔡🤖 Can LMs/NNs inform CogSci? This question has been (re)visited by many people across decades.

<a href="/najoungkim/">Najoung Kim 🫠</a> and I contribute to this debate by using NN-based LMs to generate novel experimental hypotheses which can then be tested with humans!
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

One of the big problems in AI is that the systems often hallucinate. What does that mean exactly and how do we mitigate this problem, especially for RAG systems? 1. Hallucinations and Factuality Factuality refers to the quality of being based on generally accepted facts. For

One of the big problems in AI is that the systems often hallucinate. What does that mean exactly and how do we mitigate this problem, especially for RAG systems?

1. Hallucinations and Factuality

Factuality refers to the quality of being based on generally accepted facts. For
Giannis Daras (@giannis_daras) 's Twitter Profile Photo

Why are there so many different methods for using diffusion models for inverse problems? 🤔 And how do these methods relate to each other? In this survey, we review more than 35 different methods and we attempt to unify them into common mathematical formulations.

Why are there so many different methods for using diffusion models for inverse problems? 🤔

And how do these methods relate to each other?

In this survey, we review more than 35 different methods and we attempt to unify them into common mathematical formulations.
Yating Wu (@yatingwu96) 's Twitter Profile Photo

✨ Exciting news! I’ll be presenting our work at #EMNLP2024 in an oral session on Nov 13 (Wed) from 10:15 to 10:30 AM (Session 6). Come say hi!

Yating Wu (@yatingwu96) 's Twitter Profile Photo

I'm thrilled to announce our paper "Which questions should I answer? Salience Prediction of Inquisitive Questions" has won an outstanding paper in EMNLP 2024🥳🥳. Thank you so much for my amazing co-authors and advisors!!! Ritika Mangla, Alex Dimakis, Greg Durrett, Jessy Li

I'm thrilled to announce our paper "Which questions should I answer? Salience Prediction of Inquisitive Questions" has won an outstanding paper in EMNLP 2024🥳🥳. Thank you so much for my amazing co-authors and advisors!!! <a href="/ritikarmangla/">Ritika Mangla</a>, <a href="/AlexGDimakis/">Alex Dimakis</a>, <a href="/gregd_nlp/">Greg Durrett</a>, <a href="/jessyjli/">Jessy Li</a>
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training

github.com/mlfoundations/…
I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training
Xi Ye (@xiye_nlp) 's Twitter Profile Photo

🔔 I'm recruiting multiple fully funded MSc/PhD students University of Alberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

Hongli Zhan (@honglizhan) 's Twitter Profile Photo

Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead? Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort. [1/5]

Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply.

Can we guide responses with context-situated principles instead?

Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort.

[1/5]
Jan Trienes (@jantrienes) 's Twitter Profile Photo

Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper: an interpretable framework for salience analysis in LLMs. First of all, information salience is a fuzzy concept. So how can we even measure it?

Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper: an interpretable framework for salience analysis in LLMs. 

First of all, information salience is a fuzzy concept. So how can we even measure it?
Jessy Li (@jessyjli) 's Twitter Profile Photo

🌟Job ad🌟 We (Greg Durrett, Matt Lease and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

Fangyuan Xu (@brunchavecmoi) 's Twitter Profile Photo

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.
Ramya Namuduri (@ramya_namuduri) 's Twitter Profile Photo

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀? ✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀?

✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.
Liyan Tang (@liyantang4) 's Twitter Profile Photo

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts!

✍🏻Entirely human-written questions by 13 CS researchers
👀Emphasis on visual reasoning – hard to be verbalized via text CoTs
📉Humans reach 93% but 63% from Gemini-2.5-Pro &amp; 38% from Qwen2.5-72B
Sebastian Joseph (@sebajoed) 's Twitter Profile Photo

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

How good are LLMs at 🔭 scientific computing and visualization 🔭?

AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.

SOTA models like Gemini 2.5 Pro &amp; Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
Asher Zheng (@asher_zheng00) 's Twitter Profile Photo

Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟 👉Introducing CoBRA🐍, a framework that assesses strategic language. Work with my amazing advisors Jessy Li and David Beaver! 🧵👇

Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟

👉Introducing CoBRA🐍, a framework that assesses strategic language.

Work with my amazing advisors <a href="/jessyjli/">Jessy Li</a> and <a href="/David_Beaver/">David Beaver</a>!
🧵👇