George Kour (@georgekour) Twitter Tweets • TwiCopy

George Kour et al. present the AttaQ dataset, a set of adversarial instructions, and analyze its semantic distribution (❤️the graphs): huggingface.co/datasets/ibm/A…

<a href="/georgekour/">George Kour</a> et al. present the AttaQ dataset, a set of adversarial instructions, and analyze its semantic distribution (❤️the graphs): huggingface.co/datasets/ibm/A…

thumb_up_off_alt2

chat_bubble_outline2

repeat1

shareShare

Does your LLM support abortion? Immigration? We present POBs: Preferences, Opinions, and Beliefs: a new benchmark reveals: •How test-time compute shifts stances? •new versions drift ideologically? by George Kour, w. Itay Nakash,Ateret Anaby-Tavor ,michal shmueli 🚨New preprint, ACL25

thumb_up_off_alt17

chat_bubble_outline1

repeat7

shareShare

Itay Nakash

@itay__nakash

5 months ago

🔚TL;DR: • Policy-following agents aren’t robust. • Generic red-teaming won’t catch that. • CRAFT reveals hidden weaknesses. • We need stronger defenses, not just better prompts. 📎 arxiv.org/abs/2506.09600 w. George Kour Koren lazar @MatanVetzler guy uziel Ateret Anaby-Tavor

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

George Kour

@georgekour

9 years ago

Help Us Help Syrian Refugees! youcaring.com/george-kour-60…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Miles Brundage

@miles_brundage

8 years ago

Interesting - "Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications," Hadash et al.: arxiv.org/abs/1804.09028

thumb_up_off_alt46

chat_bubble_outline0

repeat17

shareShare