Jan Hendrik Kirchner (@janhkirchner) Twitter Tweets • TwiCopy

Jan Hendrik Kirchner

@janhkirchner

+ Follow

formerly comp neuroscience @ mpi brain research frankfurt ➡️ small verifier

ID: 972038953586057216

linkhttp://universalprior.substack.com calendar_today09-03-2018 09:18:40

447 Tweet

1,1K Followers

527 Following

T. Greer

@scholars_stage

a year ago

It will be hard to keep the commanding heights in (western) liberal hands while simultaneously handing the technology over to the governments of the developing world. These goals are probably not compatible—it will take a lot of work to make them so.

thumb_up_off_alt63

chat_bubble_outline2

repeat2

shareShare

Jan Hendrik Kirchner

@janhkirchner

a year ago

would be great if we had a way to robustly make AI do (only) what we want it to do

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Anthropic

@anthropicai

a year ago

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

thumb_up_off_alt10,10K

chat_bubble_outline484

repeat1,1K

shareShare

Alex Mallen

@alextmallen

a year ago

New paper! How should we make trade-offs between the quantity and quality of labels used for eliciting knowledge from capable AI systems?

thumb_up_off_alt45

chat_bubble_outline1

repeat8

shareShare

Jan Hendrik Kirchner

@janhkirchner

a year ago

Someone recommended The Goal from Goldratt to me and I have to say, there's nothing in there that Factorio hasn't taught me already

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Jan Hendrik Kirchner

@janhkirchner

10 months ago

i've played that game in my dreams before

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Samuel Albanie 🇬🇧

@samuelalbanie

8 months ago

Video summary for "Prover-Verifier Games improve legibility of LLM outputs" youtu.be/EMDa4urzz-M 1/2

thumb_up_off_alt12

chat_bubble_outline1

repeat5

shareShare

Anthropic

@anthropicai

6 months ago

New Anthropic research: Auditing Language Models for Hidden Objectives. We deliberately trained a model with a hidden misaligned objective and put researchers to the test: Could they figure out the objective without being told?

thumb_up_off_alt1,1K

chat_bubble_outline112

repeat258

shareShare

Samuel Marks

@saprmarks

6 months ago

New paper with Johannes Treutlein , Evan Hubinger , and many other coauthors! We train a model with a hidden misaligned objective and use it to run an auditing game: Can other teams of researchers uncover the model’s objective? x.com/AnthropicAI/st…

thumb_up_off_alt124

chat_bubble_outline6

repeat15

shareShare

Jan Hendrik Kirchner

@janhkirchner

5 months ago

i enjoy being a hypocrite as much as the next person, but if there’s one thing i can’t stand it’s a hypocrite

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

IICCSSS

@iiccsss2024

4 months ago

Registration for IICCSSS 2025 in Darmstadt is now open! 🥳 Sign up for a week of exciting talks, hands-on projects and inspiring discussions! iiccsss.org/registration/ As always, IICCSSS is free, and open to all students who are excited about computational cognitive science 💡🧠

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Anthropic

@anthropicai

4 months ago

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.