John Schulman (@johnschulman2) 's Twitter Profile
John Schulman

@johnschulman2

Recently started @thinkymachines. Interested in reinforcement learning, alignment, birds, jazz music

ID: 1388977636618080256

linkhttp://joschu.net calendar_today02-05-2021 22:05:23

113 Tweet

61,61K Followers

922 Following

John Schulman (@johnschulman2) 's Twitter Profile Photo

Stumbled upon this charming short story, "Someday", by Isaac Asimov: nyc3.digitaloceanspaces.com/sffaudio-usa/m…. Features a language model called Bard, which the boys fine-tune on some recent data discussing itself and other LMs...

John Schulman (@johnschulman2) 's Twitter Profile Photo

I've been enjoying Richard Ngo's sci-fi writing at narrativeark dot xyz. It's a rare feat to combine these three properties: (1) about post-AGI worlds (2) plausible (3) actually fun to read.

John Schulman (@johnschulman2) 's Twitter Profile Photo

A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works

John Schulman (@johnschulman2) 's Twitter Profile Photo

"Trust region utilitarianism": there is a sensible utility function to maximize, but it's only valid locally around the current state of the world, where the intuitions that produced it are grounded. "Repugnant conclusion" is outside trust region -- not a problem

John Schulman (@johnschulman2) 's Twitter Profile Photo

I'd like to see some research on where the political and moral ideologies of RLHF'd language models come from. Make some questionairres that measure a model's ideology. Create a variety of models with few-shot prompting, SFT, and RL; look at the ideology at each stage and how it

OpenAI (@openai) 's Twitter Profile Photo

To deepen the public conversation about how AI models should behave, we’re sharing our Model Spec — our approach to shaping desired model behavior. openai.com/index/introduc…

John Schulman (@johnschulman2) 's Twitter Profile Photo

I shared the following note with my OpenAI colleagues today: I've made the difficult decision to leave OpenAI. This choice stems from my desire to deepen my focus on AI alignment, and to start a new chapter of my career where I can return to hands-on technical work. I've decided

Transluce (@transluceai) 's Twitter Profile Photo

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

Samuel Marks (@saprmarks) 's Twitter Profile Photo

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
John Schulman (@johnschulman2) 's Twitter Profile Photo

There are some intriguing similarities between the r1 chains of thought and the o1-preview CoTs shared in papers and blog posts (eg openai.com/index/learning…). In particular, note the heavy use of the words "wait" and "alternatively" as a transition words for error correction and

John Schulman (@johnschulman2) 's Twitter Profile Photo

Confirming that I left Anthropic last week. Leaving wasn't easy because I enjoyed the stimulating research environment and the kind and talented people I was working with, but I decided to go with another opportunity that I found extremely compelling. I'll share more details in

John Schulman (@johnschulman2) 's Twitter Profile Photo

Excited to build a new AI research lab with some of my favorite former colleagues and some great new ones. Looking forward to sharing more in the coming weeks.

John Schulman (@johnschulman2) 's Twitter Profile Photo

Whether to collect preferences ("do you prefer response A or B?") from the same person who wrote the prompt, or a different person, is important and understudied. Highlighted this question in a recent talk docs.google.com/presentation/d…. Sycophancy probably results when you have the

Whether to collect preferences ("do you prefer response A or B?") from the same person who wrote the prompt, or a different person, is important and understudied. Highlighted this question in a recent talk docs.google.com/presentation/d…. Sycophancy probably results when you have the
John Schulman (@johnschulman2) 's Twitter Profile Photo

A research project related to sycophancy: define explicit features like "does the response agree with the user" as in arxiv.org/abs/2310.13548, and then construct a preference function that subtracts out their effect, as in arxiv.org/abs/2404.04475. I.e., remove some bad causal

John Schulman (@johnschulman2) 's Twitter Profile Photo

For people who don't like Claude's behavior here (and I think it's totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to