Peter West (@peterwesttm) 's Twitter Profile
Peter West

@peterwesttm

AI / NLP Researcher

Incoming faculty at @UBC_CS and @CAIDA_UBC
Postdoctoral fellow at @StanfordHAI @stanfordnlp
Former PhD student at @uwcse @uwnlp

he/him

ID: 1174003704724217856

linkhttps://peterwest.pw/ calendar_today17-09-2019 16:54:36

243 Tweet

1,1K Followers

719 Following

Taylor Sorensen (@ma_tay_) 's Twitter Profile Photo

๐Ÿค”๐Ÿค–Most AI systems assume thereโ€™s just one right answerโ€”but many tasks have reasonable disagreement. How can we better model human variation? ๐ŸŒโœจ We propose modeling at the individual-level using open-ended, textual value profiles! ๐Ÿ—ฃ๏ธ๐Ÿ“ arxiv.org/abs/2503.15484 (1/?)

๐Ÿค”๐Ÿค–Most AI systems assume thereโ€™s just one right answerโ€”but many tasks have reasonable disagreement. How can we better model human variation? ๐ŸŒโœจ

We propose modeling at the individual-level using open-ended, textual value profiles! ๐Ÿ—ฃ๏ธ๐Ÿ“

arxiv.org/abs/2503.15484
(1/?)
UBC Science (@ubcscience) 's Twitter Profile Photo

#UBC computer scientists and linguists are using #AI to identify disparities across translations of Wikipedia biographies of #LGBT-identifying public figures. UBC Computer Science Vector Institute Vered Shwartz UBC NLP Group bit.ly/3QSqjZ2

#UBC computer scientists and linguists are using #AI to identify disparities across translations of Wikipedia biographies of #LGBT-identifying public figures.

<a href="/UBC_CS/">UBC Computer Science</a> <a href="/VectorInst/">Vector Institute</a> <a href="/VeredShwartz/">Vered Shwartz</a> <a href="/UBC_NLP/">UBC NLP Group</a> 

bit.ly/3QSqjZ2
Leshem Choshen C U @ ICLR ๐Ÿค–๐Ÿค— (@lchoshen) 's Twitter Profile Photo

Base Models Beat Aligned Models at Randomness and Creativity. Peter West & Christopher Potts tell us that alignment don't only extract abilities hidden in the pretraining, it also hides other abilities:

Base Models Beat Aligned Models at Randomness and Creativity.
<a href="/PeterWestTM/">Peter West</a> &amp; <a href="/ChrisGPotts/">Christopher Potts</a> tell us that alignment don't only extract abilities hidden in the pretraining, it also hides other abilities:
Peter West (@peterwesttm) 's Twitter Profile Photo

Very excited for this unique workshop we're hosting at COLM -- rather than asking for submissions, we have a terrific, diverse set of speakers giving fresh perspectives on the future of LMs. Don't miss it!

Mike A. Merrill (@mike_a_merrill) 's Twitter Profile Photo

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? Weโ€™re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? 

Weโ€™re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
Liwei Jiang (@liweijianglw) 's Twitter Profile Photo

Can language models reason about ๐•š๐•Ÿ๐••๐•š๐•ง๐•š๐••๐•ฆ๐•’๐•๐•š๐•ค๐•ฅ๐•š๐•” ๐•™๐•ฆ๐•ž๐•’๐•Ÿ ๐•ง๐•’๐•๐•ฆ๐•–๐•ค ๐•’๐•Ÿ๐•• ๐•ก๐•ฃ๐•–๐•—๐•–๐•ฃ๐•–๐•Ÿ๐•”๐•–๐•ค? is accepted to #ACL2025 main conference! See you in Vienna! (arxiv.org/abs/2410.03868)

Csordรกs Rรณbert (@robert_csordas) 's Twitter Profile Photo

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations.

In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6
Jaehun Jung (@jaehunjung_com) 's Twitter Profile Photo

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? ๐Ÿค” ๐ƒ๐š๐ญ๐š ๐๐ข๐ฏ๐ž๐ซ๐ฌ๐ข๐ญ๐ฒ is key, when measured correctโ€”it strongly predicts model generalization in reasoning tasks! ๐Ÿงต

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? ๐Ÿค”

๐ƒ๐š๐ญ๐š ๐๐ข๐ฏ๐ž๐ซ๐ฌ๐ข๐ญ๐ฒ is key, when measured correctโ€”it strongly predicts model generalization in reasoning tasks! ๐Ÿงต
Alex Gill (@alex_gill_nlp) 's Twitter Profile Photo

๐–๐ก๐š๐ญ ๐‡๐š๐ฌ ๐๐ž๐ž๐ง ๐‹๐จ๐ฌ๐ญ ๐–๐ข๐ญ๐ก ๐’๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐„๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง? I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of Abhilasha Ravichander and Ana Marasoviฤ‡ (Full link below ๐Ÿ‘‡)

๐–๐ก๐š๐ญ ๐‡๐š๐ฌ ๐๐ž๐ž๐ง ๐‹๐จ๐ฌ๐ญ ๐–๐ข๐ญ๐ก ๐’๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐„๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง?

I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of <a href="/lasha_nlp/">Abhilasha Ravichander</a> and <a href="/anmarasovic/">Ana Marasoviฤ‡</a> 

(Full link below ๐Ÿ‘‡)
Kaiser Sun (@kaiserwholearns) 's Twitter Profile Photo

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint๐Ÿ“‘ TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.๐Ÿ“‘๐Ÿงตโฌ‡๏ธ 1/8 #NLProc #LLM #AIResearch

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint๐Ÿ“‘
TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.๐Ÿ“‘๐Ÿงตโฌ‡๏ธ 1/8
#NLProc #LLM #AIResearch
Harvey Yiyun Fu (@harveyiyun) 's Twitter Profile Photo

LLMs excel at finding surprising โ€œneedlesโ€ in very long documents, but can they detect when information is conspicuously missing? ๐ŸซฅAbsenceBench๐Ÿซฅ shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving โ€œnegative spaceโ€ in documents. paper:

LLMs excel at finding surprising โ€œneedlesโ€ in very long documents, but can they detect when information is conspicuously missing?

๐ŸซฅAbsenceBench๐Ÿซฅ shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving โ€œnegative spaceโ€ in documents.

paper:
Ari Holtzman (@universeinanegg) 's Twitter Profile Photo

The fact that in pretty much all LLMs the generative branching factor goes down as the model keeps generating feels like a fundamental limit of LLM creativity, and I've never seen a satisfying solution.

Ari Holtzman (@universeinanegg) 's Twitter Profile Photo

Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. ๐Ÿงตand new paper: arxiv.org/abs/2507.00163

Niloofar (on faculty job market!) (@niloofar_mire) 's Twitter Profile Photo

๐Ÿงต Academic job market season is almost here! There's so much rarely discussedโ€”nutrition, mental and physical health, uncertainty, and more. I'm sharing my statements, essential blogs, and personal lessons here, with more to come in the upcoming weeks! โฌ‡๏ธ (1/N)

Ari Holtzman (@universeinanegg) 's Twitter Profile Photo

testing a game we're building where the mechanic is writing tricky HR emails, and noticing that LLMs have a built-in secret handshake with users to bypass safety guardrails. This seems both necessary to make LLMs actually useful and like they make guardrails essentially useless

testing a game we're building where the mechanic is writing tricky HR emails, and noticing that LLMs have a built-in secret handshake with users to bypass safety guardrails. This seems both necessary to make LLMs actually useful and like they make guardrails essentially useless