John Schulman (@johnschulman2) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Stumbled upon this charming short story, "Someday", by Isaac Asimov: nyc3.digitaloceanspaces.com/sffaudio-usa/m…. Features a language model called Bard, which the boys fine-tune on some recent data discussing itself and other LMs...

thumb_up_off_alt86

chat_bubble_outline8

repeat17

shareShare

John Schulman

@johnschulman2

2 years ago

I've been enjoying Richard Ngo's sci-fi writing at narrativeark dot xyz. It's a rare feat to combine these three properties: (1) about post-AGI worlds (2) plausible (3) actually fun to read.

thumb_up_off_alt109

chat_bubble_outline2

repeat5

shareShare

John Schulman

@johnschulman2

2 years ago

A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works

thumb_up_off_alt660

chat_bubble_outline17

repeat92

shareShare

John Schulman

@johnschulman2

2 years ago

Coming soon to your favorite word processor Ctrl-alt-V: "paste and paraphrase" also, "paste and match writing style"

thumb_up_off_alt210

chat_bubble_outline10

repeat15

shareShare

John Schulman

@johnschulman2

2 years ago

"Trust region utilitarianism": there is a sensible utility function to maximize, but it's only valid locally around the current state of the world, where the intuitions that produced it are grounded. "Repugnant conclusion" is outside trust region -- not a problem

thumb_up_off_alt109

chat_bubble_outline7

repeat6

shareShare

John Schulman

@johnschulman2

a year ago

I'd like to see some research on where the political and moral ideologies of RLHF'd language models come from. Make some questionairres that measure a model's ideology. Create a variety of models with few-shot prompting, SFT, and RL; look at the ideology at each stage and how it

thumb_up_off_alt272

chat_bubble_outline20

repeat19

shareShare

OpenAI

@openai

a year ago

To deepen the public conversation about how AI models should behave, we’re sharing our Model Spec — our approach to shaping desired model behavior. openai.com/index/introduc…

thumb_up_off_alt1,1K

chat_bubble_outline429

repeat343

shareShare

John Schulman

@johnschulman2

a year ago

I shared the following note with my OpenAI colleagues today: I've made the difficult decision to leave OpenAI. This choice stems from my desire to deepen my focus on AI alignment, and to start a new chapter of my career where I can return to hands-on technical work. I've decided

thumb_up_off_alt5,5K

chat_bubble_outline184

repeat407

shareShare

Transluce

@transluceai

9 months ago

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

thumb_up_off_alt703

chat_bubble_outline35

repeat148

shareShare

Samuel Marks

@saprmarks

7 months ago

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.

thumb_up_off_alt330

chat_bubble_outline10

repeat66

shareShare

John Schulman

@johnschulman2

6 months ago

There are some intriguing similarities between the r1 chains of thought and the o1-preview CoTs shared in papers and blog posts (eg openai.com/index/learning…). In particular, note the heavy use of the words "wait" and "alternatively" as a transition words for error correction and

thumb_up_off_alt737

chat_bubble_outline36

repeat42

shareShare

John Schulman

@johnschulman2

6 months ago

Confirming that I left Anthropic last week. Leaving wasn't easy because I enjoyed the stimulating research environment and the kind and talented people I was working with, but I decided to go with another opportunity that I found extremely compelling. I'll share more details in

thumb_up_off_alt2,2K

chat_bubble_outline88

repeat85

shareShare

John Schulman

@johnschulman2

5 months ago

Excited to build a new AI research lab with some of my favorite former colleagues and some great new ones. Looking forward to sharing more in the coming weeks.

thumb_up_off_alt1,1K

chat_bubble_outline41

repeat46

shareShare

John Schulman

@johnschulman2

3 months ago

Whether to collect preferences ("do you prefer response A or B?") from the same person who wrote the prompt, or a different person, is important and understudied. Highlighted this question in a recent talk docs.google.com/presentation/d…. Sycophancy probably results when you have the

thumb_up_off_alt373

chat_bubble_outline10

repeat32

shareShare

John Schulman

@johnschulman2

3 months ago

A research project related to sycophancy: define explicit features like "does the response agree with the user" as in arxiv.org/abs/2310.13548, and then construct a preference function that subtracts out their effect, as in arxiv.org/abs/2404.04475. I.e., remove some bad causal

thumb_up_off_alt272

chat_bubble_outline6

repeat19

shareShare

John Schulman

@johnschulman2

2 months ago

For people who don't like Claude's behavior here (and I think it's totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to

thumb_up_off_alt706

chat_bubble_outline130

repeat42

shareShare