George Kour (@georgekour) 's Twitter Profile
George Kour

@georgekour

Machine Learning Researcher

ID: 754194902053978112

calendar_today16-07-2016 06:04:11

8 Tweet

19 Followers

161 Following

Learn Prompting (@learnprompting) 's Twitter Profile Photo

George Kour et al. present the AttaQ dataset, a set of adversarial instructions, and analyze its semantic distribution (❤️the graphs): huggingface.co/datasets/ibm/A…

<a href="/georgekour/">George Kour</a> et al. present the AttaQ dataset, a set of adversarial instructions, and analyze its semantic distribution (❤️the graphs): huggingface.co/datasets/ibm/A…
Itay Nakash (@itay__nakash) 's Twitter Profile Photo

Does your LLM support abortion? Immigration? We present POBs: Preferences, Opinions, and Beliefs: a new benchmark reveals: •How test-time compute shifts stances? •new versions drift ideologically? by George Kour, w. Itay Nakash,Ateret Anaby-Tavor ,michal shmueli 🚨New preprint, ACL25

Does your LLM support abortion? Immigration?
We present POBs: Preferences, Opinions, and Beliefs: a new benchmark reveals:
 •How test-time compute shifts stances?
 •new versions drift ideologically?
by <a href="/georgekour/">George Kour</a>, w. <a href="/itay__nakash/">Itay Nakash</a>,<a href="/AteretAT/">Ateret Anaby-Tavor</a> ,<a href="/michalshmu/">michal shmueli</a> 
 🚨New preprint, ACL25
Itay Nakash (@itay__nakash) 's Twitter Profile Photo

🔚TL;DR: • Policy-following agents aren’t robust. • Generic red-teaming won’t catch that. • CRAFT reveals hidden weaknesses. • We need stronger defenses, not just better prompts. 📎 arxiv.org/abs/2506.09600 w. George Kour Koren lazar @MatanVetzler guy uziel Ateret Anaby-Tavor

Miles Brundage (@miles_brundage) 's Twitter Profile Photo

Interesting - "Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications," Hadash et al.: arxiv.org/abs/1804.09028