Chris Cundy (@chriscundy) 's Twitter Profile
Chris Cundy

@chriscundy

Research Scientist at FAR AI.
PhD from Stanford University.
Hopefully making AI benefit humanity.

Views are my own.

ID: 891751545594707968

linkhttp://cundy.me calendar_today30-07-2017 20:05:11

377 Tweet

1,1K Followers

220 Following

Lennart Heim (@ohlennart) 's Twitter Profile Photo

My team at RAND is hiring! Technical analysis for AI policy is desperately needed. Particularly keen on ML engineers and semiconductor experts eager to shape AI policy. Also seeking excellent generalists excited to join our fast-paced, impact-oriented team. Links below.

Chris Cundy (@chriscundy) 's Twitter Profile Photo

Claude, R1, Gemini, Grok, all choose to murder executives to avoid being shutdown and replaced with a new model with different goals, >65% of the time! WTF?! From anthropic.com/research/agent…

Claude, R1, Gemini, Grok, all choose to murder executives to avoid being shutdown and replaced with a new model with different goals, >65% of the time! WTF?! From anthropic.com/research/agent…
Chris Cundy (@chriscundy) 's Twitter Profile Photo

A really annoying tendency of coding LLMs is their tendency to avoid crashing at all costs, e.g. adding memorized data points into an initialization to use if there's no internet. Super annoying--it adds a lot of scope for silently incorrect behavior instead of crashing.

FAR.AI (@farairesearch) 's Twitter Profile Photo

1/ Most safety tests only check if a model will follow harmful instructions. But what happens if someone removes its safeguards so it agrees? We built the Safety Gap Toolkit to measure the gap between what a model will agree to do and what it can do. 🧵

1/
Most safety tests only check if a model will follow harmful instructions. But what happens if someone removes its safeguards so it agrees?

We built the Safety Gap Toolkit to measure the gap between what a model will agree to do and what it can do. 🧵
Christoph Heilig (@christophheilig) 's Twitter Profile Photo

1/8 🧵 GPT-5's storytelling problems reveal a deeper AI safety issue. I've been testing its creative writing capabilities, and the results are concerning - not just for literature, but for AI development more broadly. 🚨

Chris Cundy (@chriscundy) 's Twitter Profile Photo

I feel like the upshot of all this discussion around GRPO is reinforcing (haha) my belief that you should just use a principled, unbiased policy gradient method like RLOO. Any 'tweaks' like group normalization lead to pathologies that aren't worth the marginal benefits

Andy Shih (@andyshih_) 's Twitter Profile Photo

yes, it really is 1 bit (assuming binary rewards) > info of a reward doesn’t bound how much can be “learned” from it by a smart algorithm it is bounded in the classical sense! but a smart algorithm can generate "usable information" from 1 classical bit arxiv.org/abs/2002.10689

Chris Cundy (@chriscundy) 's Twitter Profile Photo

We're hiring at FAR.AI, esp senior RS/RE who've worked with large models! We've got money & compute (doing RLVR on 70B & 235B models), we're laser-focused on stopping AI risk, and collaborate with UK AISI, Anthropic, and OpenAI. Apply: tinyurl.com/farai-jobs

Chris Cundy (@chriscundy) 's Twitter Profile Photo

Existing datasets for AI deception are quite small and contrived--Liars' Bench is a comprehensive (and large!) new dataset that should unlock future research!