Yating Wu (@yatingwu96) Twitter Tweets • TwiCopy

Alex Dimakis

a year ago

Excited to launch the first model from our startup: Bespoke Labs. Bespoke-Minicheck-7B is a grounded factuality checker: super lightweight and fast. Outperforms all big foundation models including Claude 3.5 Sonnet, Mistral-Large m2 and GPT 4o and its only 7B. Also, I want to

thumb_up_off_alt165

chat_bubble_outline9

repeat34

shareShare

Kanishka Misra 🌊

@kanishkamisra

a year ago

🧐🔡🤖 Can LMs/NNs inform CogSci? This question has been (re)visited by many people across decades. Najoung Kim 🫠 and I contribute to this debate by using NN-based LMs to generate novel experimental hypotheses which can then be tested with humans!

🧐🔡🤖 Can LMs/NNs inform CogSci? This question has been (re)visited by many people across decades.

<a href="/najoungkim/">Najoung Kim 🫠</a> and I contribute to this debate by using NN-based LMs to generate novel experimental hypotheses which can then be tested with humans!

thumb_up_off_alt83

chat_bubble_outline2

repeat13

shareShare

Alex Dimakis

@alexgdimakis

a year ago

One of the big problems in AI is that the systems often hallucinate. What does that mean exactly and how do we mitigate this problem, especially for RAG systems? 1. Hallucinations and Factuality Factuality refers to the quality of being based on generally accepted facts. For

thumb_up_off_alt40

chat_bubble_outline2

repeat9

shareShare

Sasha Boguraev

@sashaboguraev

a year ago

🎩“The math of the people, by the people, for the people, shall not perish from our models” AI math systems often abstract away from language by augmenting LLMs with symbolic solvers and logical systems. While promising, is something lost? 🧵

thumb_up_off_alt30

chat_bubble_outline2

repeat11

shareShare

Giannis Daras

@giannis_daras

a year ago

Why are there so many different methods for using diffusion models for inverse problems? 🤔 And how do these methods relate to each other? In this survey, we review more than 35 different methods and we attempt to unify them into common mathematical formulations.

thumb_up_off_alt583

chat_bubble_outline13

repeat102

shareShare

Yating Wu

@yatingwu96

a year ago

✨ Exciting news! I’ll be presenting our work at #EMNLP2024 in an oral session on Nov 13 (Wed) from 10:15 to 10:30 AM (Session 6). Come say hi!

thumb_up_off_alt16

chat_bubble_outline1

repeat5

shareShare

Yating Wu

@yatingwu96

a year ago

Correction: This will be happening tomorrow - Nov 13 (Wed) from 11:15 to 11:30 AM ET in conference room Flagler.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Yating Wu

@yatingwu96

a year ago

I'm thrilled to announce our paper "Which questions should I answer? Salience Prediction of Inquisitive Questions" has won an outstanding paper in EMNLP 2024🥳🥳. Thank you so much for my amazing co-authors and advisors!!! Ritika Mangla, Alex Dimakis, Greg Durrett, Jessy Li

thumb_up_off_alt102

chat_bubble_outline7

repeat9

shareShare

Alex Dimakis

@alexgdimakis

a year ago

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training

thumb_up_off_alt242

chat_bubble_outline9

repeat41

shareShare

Xi Ye

@xiye_nlp

a year ago

🔔 I'm recruiting multiple fully funded MSc/PhD students University of Alberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

thumb_up_off_alt526

chat_bubble_outline16

repeat158

shareShare

Hongli Zhan

@honglizhan

9 months ago

Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead? Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort. [1/5]

thumb_up_off_alt30

chat_bubble_outline1

repeat9

shareShare

Jan Trienes

@jantrienes

8 months ago

Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper: an interpretable framework for salience analysis in LLMs. First of all, information salience is a fuzzy concept. So how can we even measure it?

thumb_up_off_alt25

chat_bubble_outline1

repeat9

shareShare

Jessy Li

@jessyjli

8 months ago

🌟Job ad🌟 We (Greg Durrett, Matt Lease and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

thumb_up_off_alt71

chat_bubble_outline1

repeat23

shareShare

Fangyuan Xu

@brunchavecmoi

8 months ago

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.

thumb_up_off_alt110

chat_bubble_outline1

repeat27

shareShare

Ramya Namuduri

@ramya_namuduri

6 months ago

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀? ✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.

thumb_up_off_alt36

chat_bubble_outline1

repeat15

shareShare

Liyan Tang

@liyantang4

5 months ago

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

thumb_up_off_alt70

chat_bubble_outline2

repeat26

shareShare

Sasha Boguraev

@sashaboguraev

5 months ago

A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models. New work with Kyle Mahowald and Christopher Potts! 🧵👇

thumb_up_off_alt71

chat_bubble_outline2

repeat19

shareShare

Sebastian Joseph

@sebajoed

5 months ago

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

thumb_up_off_alt18

chat_bubble_outline1

repeat8

shareShare

Asher Zheng

@asher_zheng00

5 months ago

Language is often strategic, but LLMs tend to play nice. How strategic are they really? Probing into that is key for future safety alignment.🛟 👉Introducing CoBRA🐍, a framework that assesses strategic language. Work with my amazing advisors Jessy Li and David Beaver! 🧵👇

thumb_up_off_alt20

chat_bubble_outline2

repeat8

shareShare

Leo Liu

@zeyuliu10

4 months ago

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

thumb_up_off_alt109

chat_bubble_outline3

repeat40

shareShare