Luke Zettlemoyer (@lukezettlemoyer) 's Twitter Profile
Luke Zettlemoyer

@lukezettlemoyer

ID: 3741979273

calendar_today30-09-2015 23:41:36

1,1K Tweet

9,9K Followers

2,2K Following

Yizhong Wang (@yizhongwyz) 's Twitter Profile Photo

Thrilled to announce that I will be joining UT Austin Computer Science at UT Austin as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

Thrilled to announce that I will be joining <a href="/UTAustin/">UT Austin</a> <a href="/UTCompSci/">Computer Science at UT Austin</a> as an assistant professor in fall 2026! 

I will continue working on language models, data challenges, learning paradigms, &amp; AI for innovation. Looking forward to teaming up with new students &amp; colleagues! 🤠🤘
Sahil Verma (@sahil1v) 's Twitter Profile Photo

🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856

🚨 New Paper! 🚨
Guard models slow, language-specific, and modality-limited?

Meet OmniGuard that detects harmful prompts across multiple languages &amp; modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀

arxiv.org/abs/2505.23856
Saumya Malik (@saumyamalik44) 's Twitter Profile Photo

I’m thrilled to share RewardBench 2 📊— We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!

I’m thrilled to share RewardBench 2 📊— We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!
Jason Weston (@jaseweston) 's Twitter Profile Photo

🚨Self-Challenging Language Model Agents🚨 📝: arxiv.org/abs/2506.01716 A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to

🚨Self-Challenging Language Model Agents🚨
📝: arxiv.org/abs/2506.01716

A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to
Chau Minh Pham (@chautmpham) 's Twitter Profile Photo

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts? 🧟 You get what we call a Frankentext! 💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.

🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?

🧟 You get what we call a Frankentext!

💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.
Mohit Iyyer (@mohitiyyer) 's Twitter Profile Photo

Tired of AI slop? Our work on "Frankentexts" shows how LLMs can stitch together random fragments of human writing into coherent, relevant responses to arbitrary prompts. Frankentexts are weirdly creative, and they also pose problems for AI detectors: are they AI? human? More 👇

Jihan Yao (@jihan_yao) 's Twitter Profile Photo

We introduce MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation ✅ Reliable: 94.3% agreement with human judgment ✅ Comprehensive: 4 modality combination × 49 tasks × 937 instructions 🔍Results and Takeaways: > GPT-Image-1 from OpenAI

We introduce MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

✅ Reliable: 94.3% agreement with human judgment
✅ Comprehensive: 4 modality combination × 49 tasks × 937 instructions

🔍Results and Takeaways:

&gt; GPT-Image-1 from <a href="/OpenAI/">OpenAI</a>
Tim Franzmeyer (@frtimlive) 's Twitter Profile Photo

What if LLMs knew when to stop? 🚧 HALT finetuning teaches LLMs to only generate content they’re confident is correct. 🔍 Insight: Post-training must be adjusted to the model’s capabilities. ⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝 with AI at Meta 🧵

Jacqueline He (@jcqln_h) 's Twitter Profile Photo

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content.

We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.
Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

NEW: Luke Zettlemoyer (@lukezettlemoyer) of University of Washington and @MetaAI walks through different approaches to building multimodal foundation models. Watch the video: youtu.be/vTI4cziw84Q #NeuroAI2025 #AI #ML #LLMs #NeuroAI

NEW: Luke Zettlemoyer (@lukezettlemoyer) of <a href="/UW/">University of Washington</a> and @MetaAI walks through different approaches to building multimodal foundation models. 

Watch the video: youtu.be/vTI4cziw84Q

 #NeuroAI2025 #AI #ML #LLMs #NeuroAI
Mickel Liu (@mickel_liu) 's Twitter Profile Photo

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat 🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat
🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker &amp; Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵
Rulin Shao (@rulinshao) 's Twitter Profile Photo

🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt

🎉Our Spurious Rewards is available on ArXiv! We added experiments on
- More prompts/steps/models/analysis...
- Spurious Prompts!
Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️

Check out our 2nd blog: tinyurl.com/spurious-prompt
Andy Konwinski (@andykonwinski) 's Twitter Profile Photo

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including Jeff Dean & Joelle Pineau on the board, Laude Institute catalyzes research with real-world impact.

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity.
Built for and by researchers, including <a href="/JeffDean/">Jeff Dean</a> &amp; <a href="/jpineau1/">Joelle Pineau</a> on the board, <a href="/LaudeInstitute/">Laude Institute</a> catalyzes research with real-world impact.
Thao Nguyen (@thao_nguyen26) 's Twitter Profile Photo

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔
We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats!

arxiv.org/abs/2506.04689
Galen Weld (@ CSCW 2024) (@galenweld) 's Twitter Profile Photo

Super surprised and honored to receive the single Best Paper award 🏆 at #ICWSM this year (out of 138 papers) for my work with Leon Leibmann, Amy Zhang, and Tim Althoff on Reddit Rules! 🎊

Super surprised and honored to receive the single Best Paper award 🏆 at #ICWSM this year (out of 138 papers) for my work with Leon Leibmann, <a href="/amyxzh/">Amy Zhang</a>, and <a href="/timalthoff/">Tim Althoff</a> on Reddit Rules! 🎊
AI at Meta (@aiatmeta) 's Twitter Profile Photo

🚀New from Meta FAIR: today we’re introducing Seamless Interaction, a research project dedicated to modeling interpersonal dynamics. The project features a family of audiovisual behavioral models, developed in collaboration with Meta’s Codec Avatars lab + Core AI lab, that

Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️

Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge
- 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor
-
Jungo Kasai 笠井淳吾 (@jungokasai) 's Twitter Profile Photo

Finally closed our $11M+ funding round! Backed by top Japanese VCs and amazing angel investors including Joi Ito, Thomas Wolf from Hugging Face, Noah A. Smith, Luke Zettlemoyer, and Sasha Rush. Now it’s time to focus on commercialization and tech development!!

Julian Michael (@_julianmichael_) 's Twitter Profile Photo

I should probably announce that a few months ago, I joined Scale AI to lead the Safety, Evaluations, and Alignment Lab… and today, I joined Meta to continue working on AI alignment with Summer Yue and Alexandr Wang. Very excited for what we can accomplish together!