Chris Cundy (@chriscundy) Twitter Tweets • TwiCopy

Chris Cundy

@chriscundy

+ Follow

Research Scientist at FAR AI.
PhD from Stanford University.
Hopefully making AI benefit humanity.

Views are my own.

ID: 891751545594707968

linkhttp://cundy.me calendar_today30-07-2017 20:05:11

377 Tweet

1,1K Followers

220 Following

Lennart Heim

@ohlennart

6 months ago

My team at RAND is hiring! Technical analysis for AI policy is desperately needed. Particularly keen on ML engineers and semiconductor experts eager to shape AI policy. Also seeking excellent generalists excited to join our fast-paced, impact-oriented team. Links below.

thumb_up_off_alt238

chat_bubble_outline6

repeat35

shareShare

Chris Cundy

@chriscundy

5 months ago

Claude, R1, Gemini, Grok, all choose to murder executives to avoid being shutdown and replaced with a new model with different goals, >65% of the time! WTF?! From anthropic.com/research/agent…

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Chris Cundy

@chriscundy

4 months ago

A really annoying tendency of coding LLMs is their tendency to avoid crashing at all costs, e.g. adding memorized data points into an initialization to use if there's no internet. Super annoying--it adds a lot of scope for silently incorrect behavior instead of crashing.

thumb_up_off_alt6

chat_bubble_outline2

repeat0

shareShare

shreya rajpal

@shreyar

4 months ago

It's not just a new model--it's an entirely new opportunity for karma farming

thumb_up_off_alt13

chat_bubble_outline3

repeat1

shareShare

FAR.AI

@farairesearch

4 months ago

1/ Most safety tests only check if a model will follow harmful instructions. But what happens if someone removes its safeguards so it agrees? We built the Safety Gap Toolkit to measure the gap between what a model will agree to do and what it can do. 🧵

thumb_up_off_alt13

chat_bubble_outline1

repeat4

shareShare

Christoph Heilig

@christophheilig

3 months ago

1/8 🧵 GPT-5's storytelling problems reveal a deeper AI safety issue. I've been testing its creative writing capabilities, and the results are concerning - not just for literature, but for AI development more broadly. 🚨

thumb_up_off_alt383

chat_bubble_outline26

repeat51

shareShare

Chris Cundy

@chriscundy

2 months ago

I feel like the upshot of all this discussion around GRPO is reinforcing (haha) my belief that you should just use a principled, unbiased policy gradient method like RLOO. Any 'tweaks' like group normalization lead to pathologies that aren't worth the marginal benefits

thumb_up_off_alt6

chat_bubble_outline2

repeat0

shareShare

Andy Shih

@andyshih_

a month ago

yes, it really is 1 bit (assuming binary rewards) > info of a reward doesn’t bound how much can be “learned” from it by a smart algorithm it is bounded in the classical sense! but a smart algorithm can generate "usable information" from 1 classical bit arxiv.org/abs/2002.10689

thumb_up_off_alt42

chat_bubble_outline1

repeat2

shareShare

Chris Cundy

@chriscundy

a month ago

We're hiring at FAR.AI, esp senior RS/RE who've worked with large models! We've got money & compute (doing RLVR on 70B & 235B models), we're laser-focused on stopping AI risk, and collaborate with UK AISI, Anthropic, and OpenAI. Apply: tinyurl.com/farai-jobs

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Chris Cundy

@chriscundy

7 days ago

Existing datasets for AI deception are quite small and contrived--Liars' Bench is a comprehensive (and large!) new dataset that should unlock future research!

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare