ML Safety (@ml_safety) Twitter Tweets • TwiCopy

ML Safety

@ml_safety

+ Follow

Course: course.mlsafety.org
Newsletter: newsletter.mlsafety.org
Papers as they come out: twitter.com/topofmlsafety.
More: mlsafety.org

ID: 1418806686500884481

linkhttp://www.mlsafety.org calendar_today24-07-2021 05:37:41

23 Tweet

1,1K Followers

2 Following

ML Safety

@ml_safety

4 years ago

For a continuous stream of safety-relevant research papers, we're continually posting on reddit and this twitter account: reddit.com/r/mlsafety/ x.com/topofmlsafety

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

ML Safety

@ml_safety

3 years ago

In the fourth ML Safety newsletter, we cover many new interpretability papers, virtual logit matching, and how rationalization can help robustness. newsletter.mlsafety.org/p/ml-safety-ne…

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

We’ll be organizing a NeurIPS workshop on Machine Learning Safety! We'll have $50K in best papers awards. To encourage proactiveness about tail risks, we'll also have $50K in awards for papers that discuss their impact on long-term, long-tail risks. neurips2022.mlsafety.org

thumb_up_off_alt184

chat_bubble_outline0

repeat38

shareShare

ML Safety

@ml_safety

3 years ago

In this special newsletter, we cover safety competitions and prizes: ML Safety Workshop ($100K), Trojan Detection ($50K), Forecasting ($625K), Uncertainty Estimation ($100K), Inverse Scaling ($250K), AI Worldview Writing Prize ($1.5M). Details: newsletter.mlsafety.org/p/ml-safety-ne…

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

ML Safety

@ml_safety

3 years ago

Can ML models spot an ethical dilemma? As ML systems make more real-world decisions it will become more important that they have a calibrated ethical awareness. Announcing a $100,000 competition for research on detecting moral ambiguity. moraluncertainty.mlsafety.org

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

ML Safety

@ml_safety

3 years ago

In the sixth ML Safety newsletter, we cover a survey of transparency research, a substantial improvement to certified robustness, new examples of 'goal misgeneralization,' and what the ML community thinks about safety issues. newsletter.mlsafety.org/p/ml-safety-ne…

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

ML Safety

@ml_safety

3 years ago

“If you cannot measure it, you cannot improve it.” ML Safety research lacks benchmarks. We are offering up to $500,000 in prizes for ML Safety benchmark ideas (or papers). Main site: benchmarking.mlsafety.org Example ideas: benchmarking.mlsafety.org/ideas

thumb_up_off_alt28

chat_bubble_outline3

repeat8

shareShare

ML Safety

@ml_safety

3 years ago

In the 7th ML Safety newsletter, we discuss AI lie detectors, research on transparency and grokking, adversarial defenses for text models, and the new ML safety course. newsletter.mlsafety.org/p/ml-safety-ne…

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

ML Safety

@ml_safety

3 years ago

In the 8th edition of the ML Safety Newsletter, we cover interpretability, using law to inform AI alignment, and scaling laws for proxy gaming. newsletter.mlsafety.org/p/ml-safety-ne…

thumb_up_off_alt14

chat_bubble_outline0

repeat6

shareShare

ML Safety

@ml_safety

3 years ago

In the 9th edition of the ML safety newsletter, we cover verifying large training runs, security risks from LLM access to APIs, why natural selection may favor AIs over humans, and more! newsletter.mlsafety.org/p/ml-safety-ne…

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Dan Hendrycks

@danhendrycks

2 years ago

Following the statement on AI extinction risks, many have called for further discussion of the challenges posed by AI and ideas on how to mitigate risk. Our new paper provides a detailed overview of catastrophic AI risks. Read it here: arxiv.org/abs/2306.12001 (🧵 below)

thumb_up_off_alt468

chat_bubble_outline34

repeat142

shareShare

ML Safety

@ml_safety

2 years ago

We’re having a social on ML safety at ICML this Wednesday (7/26) with food and snacks! The social will be from 5:45 pm to 7:30 PM Hawaii time in room 323 in Hawaii Convention Center. Register here (so we can estimate how much food to buy)! docs.google.com/forms/d/e/1FAI…

thumb_up_off_alt18

chat_bubble_outline0

repeat6

shareShare

ML Safety

@ml_safety

2 years ago

Tomorrow at 1pm PST, Kenneth Li will present at the Center for AI Safety’s Reading and Learning event. Kenneth has recently published on identifying world models in LLM activations and improving truthfulness in LLM outputs. Here are the details: centerforaisafety.github.io/reading/

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

ML Safety

@ml_safety

2 years ago

A collection of some of the best safety papers of 2023 newsletter.mlsafety.org/p/ml-safety-ne…

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

ML Safety

@ml_safety

a year ago

We’re having a social on ML Safety at ICLR this Thursday (5/9) with drinks and snacks! The social will be from 5:30-7:30 pm CET in room Schubert 4 at the Messe Wien Exhibition and Congress Center. Register here (so we can estimate how much food to buy)! forms.gle/zWhi6BXbdBTYhE…

thumb_up_off_alt8

chat_bubble_outline1

repeat3

shareShare

ML Safety

@ml_safety

a year ago

Join us for a panel and social on ML Safety at ICML tomorrow (07/23) at 5.30 CET in Lehar 1-4! We have a great set of panelists lined up to discuss progress in ML Safety research including Bo Li, David Krueger and Sanmi Koyejo.

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare