Maarten Sap (he/him) (@maartensap) 's Twitter Profile
Maarten Sap (he/him)

@maartensap

retiring X acct: find me @maartensap.bsky
Working on #NLProc for social good.
Currently at @LTIatCMU, previously at @UWNLP, @MSFTResearch, and @allen_ai. 🏳‍🌈

ID: 3376146821

linkhttps://www.maartensap.com calendar_today14-07-2015 18:15:41

619 Tweet

5,5K Followers

631 Following

Ai2 (@allen_ai) 's Twitter Profile Photo

Our friends at CMU School of Computer Science have just launched a new podcast, Does Compute, which features our visiting research scientist Maarten Sap (he/him) in episode two, "What AI Isn't, Part 2" — youtube.com/watch?v=aOrGbc…

Kaitlyn Zhou ✈️ CSCW, EMNLP! (@kaitlynzhou) 's Twitter Profile Photo

How can we best measure the consequences of LLM overconfidence? ✨New preprint✨ on measuring the risks of human over-reliance on LLM expressions of uncertainty: arxiv.org/pdf/2407.07950 w/Jena Hwang Sean Ren | Sahara AI 🔆 Nouha Dziri Dan Jurafsky Maarten Sap (he/him) Stanford NLP Group Ai2 #NLPproc

How can we best measure the consequences of LLM overconfidence?

✨New preprint✨ on measuring the risks of human over-reliance on LLM expressions of uncertainty: arxiv.org/pdf/2407.07950

w/<a href="/JenaHwang2/">Jena Hwang</a> <a href="/xiangrenNLP/">Sean Ren | Sahara AI 🔆</a> <a href="/nouhadziri/">Nouha Dziri</a> <a href="/jurafsky/">Dan Jurafsky</a> <a href="/MaartenSap/">Maarten Sap (he/him)</a> <a href="/stanfordnlp/">Stanford NLP Group</a> <a href="/allen_ai/">Ai2</a> #NLPproc
Xuhui Zhou (@nlpxuhui) 's Twitter Profile Photo

1/ What if you could see how your AI handles the chaos of the real world? Meet HAICOSYSTEM: the framework to simulate human-AI-environment interactions—all at once. 🌍🤖 Find out if your AI is truly safe under pressure from real-world scenarios! 🔥 🌐: haicosystem.org

Language Technologies Institute | @CarnegieMellon (@ltiatcmu) 's Twitter Profile Photo

HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs by Jocelyn Shen, Joel Mire, Hae Won Park, Cynthia Breazeal, & Maarten Sap (he/him) Session: Computational Social Science and Cultural Analytics 1, Session 03, 14:00-15:30 aclanthology.org/2024.emnlp-mai…

Joel Mire (@joel_mire) 's Twitter Profile Photo

I’m thrilled to be at EMNLP this week presenting our paper, “The Empirical Variability of Narrative Perceptions of Social Media Texts” I’ll be giving an oral presentation during the CSS + Cultural Analytics Session 2 (Nov 14). Paper: aclanthology.org/2024.emnlp-mai… 🧵(1/12)

I’m thrilled to be at EMNLP this week presenting our paper, “The Empirical Variability of Narrative Perceptions of Social Media Texts”

I’ll be giving an oral presentation during the CSS + Cultural Analytics Session 2 (Nov 14).

Paper: aclanthology.org/2024.emnlp-mai… 🧵(1/12)
Jiaxin Ge (@aomaru_21490) 's Twitter Profile Photo

Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, high-quality presentation slides from scratch! 📄 arxiv.org/abs/2501.00912 🤗 huggingface.co/spaces/JiaxinG… 🔗 github.com/para-lost/Auto… Berkeley AI Research Language Technologies Institute | @CarnegieMellon

Sanidhya Vijayvargiya (@sanidhya903) 's Twitter Profile Photo

1/ LLM agents can code—but can they ask clarifying questions? 🤖💬 Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀

1/ LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
Maarten Sap (he/him) (@maartensap) 's Twitter Profile Photo

Exciting to see more work in this space! We actually worked on this in our 2024 Ai-liedar paper (recently got accepted to NAACL 2025) arxiv.org/abs/2409.09013!

Maarten Sap (he/him) (@maartensap) 's Twitter Profile Photo

RLHF is built upon some quite oversimplistic assumptions, that preferences between pairs of text are about quality. But this is a very subjective task (not unlike toxicity annotation) -- so we wanted to know, do biases similar to toxicity annotation emerge in reward models?

Maarten Sap (he/him) (@maartensap) 's Twitter Profile Photo

Excited to unveil a new SOTA safeguarding model that gets beats open-source safeguarding models, is on-par or better than closed-source ones, and supports 17 languages!

Language Technologies Institute | @CarnegieMellon (@ltiatcmu) 's Twitter Profile Photo

Hand gestures are a major mode of human communication, but they don't always translate well across cultures. New research from Akhila Yerukola, Maarten Sap (he/him) and others is aimed at giving AI systems a hand with overcoming cultural biases: lti.cmu.edu/news-and-event…