Peter West (@peterwesttm) Twitter Tweets • TwiCopy

Peter West

@peterwesttm

+ Follow

AI / NLP Researcher

Incoming faculty at @UBC_CS and @CAIDA_UBC
Postdoctoral fellow at @StanfordHAI @stanfordnlp
Former PhD student at @uwcse @uwnlp

he/him

ID: 1174003704724217856

linkhttps://peterwest.pw/ calendar_today17-09-2019 16:54:36

243 Tweet

1,1K Followers

719 Following

Taylor Sorensen

@ma_tay_

7 months ago

🤔🤖Most AI systems assume there’s just one right answer—but many tasks have reasonable disagreement. How can we better model human variation? 🌍✨ We propose modeling at the individual-level using open-ended, textual value profiles! 🗣️📝 arxiv.org/abs/2503.15484 (1/?)

thumb_up_off_alt150

chat_bubble_outline3

repeat32

shareShare

UBC Science

@ubcscience

7 months ago

#UBC computer scientists and linguists are using #AI to identify disparities across translations of Wikipedia biographies of #LGBT-identifying public figures. UBC Computer Science Vector Institute Vered Shwartz UBC NLP Group bit.ly/3QSqjZ2

thumb_up_off_alt3

chat_bubble_outline0

repeat3

shareShare

Leshem Choshen C U @ ICLR 🤖🤗

@lchoshen

6 months ago

Base Models Beat Aligned Models at Randomness and Creativity. Peter West & Christopher Potts tell us that alignment don't only extract abilities hidden in the pretraining, it also hides other abilities:

Base Models Beat Aligned Models at Randomness and Creativity.
<a href="/PeterWestTM/">Peter West</a> & <a href="/ChrisGPotts/">Christopher Potts</a> tell us that alignment don't only extract abilities hidden in the pretraining, it also hides other abilities:

thumb_up_off_alt15

chat_bubble_outline3

repeat5

shareShare

Peter West

@peterwesttm

6 months ago

Very excited for this unique workshop we're hosting at COLM -- rather than asking for submissions, we have a terrific, diverse set of speakers giving fresh perspectives on the future of LMs. Don't miss it!

thumb_up_off_alt25

chat_bubble_outline0

repeat1

shareShare

Mike A. Merrill

@mike_a_merrill

5 months ago

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr

thumb_up_off_alt220

chat_bubble_outline14

repeat57

shareShare

Liwei Jiang

@liweijianglw

5 months ago

Can language models reason about 𝕚𝕟𝕕𝕚𝕧𝕚𝕕𝕦𝕒𝕝𝕚𝕤𝕥𝕚𝕔 𝕙𝕦𝕞𝕒𝕟 𝕧𝕒𝕝𝕦𝕖𝕤 𝕒𝕟𝕕 𝕡𝕣𝕖𝕗𝕖𝕣𝕖𝕟𝕔𝕖𝕤? is accepted to #ACL2025 main conference! See you in Vienna! (arxiv.org/abs/2410.03868)

thumb_up_off_alt47

chat_bubble_outline0

repeat12

shareShare

Csordás Róbert

@robert_csordas

5 months ago

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat139

shareShare

Jaehun Jung

@jaehunjung_com

5 months ago

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

thumb_up_off_alt175

chat_bubble_outline4

repeat32

shareShare

Alex Gill

@alex_gill_nlp

5 months ago

𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧? I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of Abhilasha Ravichander and Ana Marasović (Full link below 👇)

thumb_up_off_alt45

chat_bubble_outline1

repeat11

shareShare

Kaiser Sun

@kaiserwholearns

4 months ago

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑 TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8 #NLProc #LLM #AIResearch

thumb_up_off_alt32

chat_bubble_outline2

repeat9

shareShare

Harvey Yiyun Fu

@harveyiyun

4 months ago

LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing? 🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative space” in documents. paper:

thumb_up_off_alt159

chat_bubble_outline11

repeat33

shareShare

Ari Holtzman

@universeinanegg

4 months ago

The fact that in pretty much all LLMs the generative branching factor goes down as the model keeps generating feels like a fundamental limit of LLM creativity, and I've never seen a satisfying solution.

thumb_up_off_alt30

chat_bubble_outline2

repeat5

shareShare

Ari Holtzman

@universeinanegg

4 months ago

Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper: arxiv.org/abs/2507.00163

thumb_up_off_alt160

chat_bubble_outline6

repeat30

shareShare

Ari Holtzman

@universeinanegg

3 months ago

the economist published my little letter about the necessity of chaos for discovery

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

Niloofar (on faculty job market!)

@niloofar_mire

3 months ago

🧵 Academic job market season is almost here! There's so much rarely discussed—nutrition, mental and physical health, uncertainty, and more. I'm sharing my statements, essential blogs, and personal lessons here, with more to come in the upcoming weeks! ⬇️ (1/N)

thumb_up_off_alt245

chat_bubble_outline3

repeat36

shareShare

Ari Holtzman

@universeinanegg

2 months ago

testing a game we're building where the mechanic is writing tricky HR emails, and noticing that LLMs have a built-in secret handshake with users to bypass safety guardrails. This seems both necessary to make LLMs actually useful and like they make guardrails essentially useless

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare