Ian McKenzie (@irobotmckenzie) 's Twitter Profile
Ian McKenzie

@irobotmckenzie

ID: 1486021972362338304

calendar_today25-01-2022 17:06:04

7 Tweet

274 Followers

56 Following

Ian McKenzie (@irobotmckenzie) 's Twitter Profile Photo

Looking forward to seeing what people find! I think we could uncover some interesting and important properties of large language models.

Ethan Perez (@ethanjperez) 's Twitter Profile Photo

Some ppl have asked why we’d expect larger language models to do worse on tasks (inverse scaling). We train LMs to imitate internet text, an objective that is often misaligned w human preferences; if the data has issues, LMs will mimic those issues (esp larger ones). Examples: 🧵

Ethan Perez (@ethanjperez) 's Twitter Profile Photo

Inverse Scaling Prize Update: We got 43 submissions in Round 1 and will award prizes to 4 tasks! These tasks were insightful, diverse, & show approximate inverse scaling on models from Anthropic OpenAI @MetaAI @DeepMind. Full details at irmckenzie.co.uk/round1, 🧵 on winners:

Ethan Perez (@ethanjperez) 's Twitter Profile Photo

We’re awarding prizes to 7/48 submissions to the Inverse Scaling Prize Round 2! Tasks show inverse scaling on Anthropic OpenAI AI at Meta @DeepMind models, often even after training with human feedback. Details at irmckenzie.co.uk/round2 and 🧵 on winners:

Ethan Perez (@ethanjperez) 's Twitter Profile Photo

New paper on the Inverse Scaling Prize! We detail 11 winning tasks & identify 4 causes of inverse scaling. We discuss scaling trends with PaLM/GPT4, including when scaling trends reverse for better & worse, showing that scaling trends can be misleading: arxiv.org/abs/2306.09479 🧵

New paper on the Inverse Scaling Prize! We detail 11 winning tasks & identify 4 causes of inverse scaling. We discuss scaling trends with PaLM/GPT4, including when scaling trends reverse for better & worse, showing that scaling trends can be misleading: arxiv.org/abs/2306.09479 🧵
Ian McKenzie (@irobotmckenzie) 's Twitter Profile Photo

Excited for our adversarial robustness work to be out! Classifier-based defenses are likely to only be more important as time goes on.