Alex Cloud (@cloud_kx) 's Twitter Profile
Alex Cloud

@cloud_kx

ID: 1176952955590905864

calendar_today25-09-2019 20:15:45

26 Tweet

103 Followers

59 Following

Alex Turner (@turn_trout) 's Twitter Profile Photo

1) AIs are trained as black boxes, making it hard to understand or control their behavior. This is bad for safety! But what is an alternative? Our idea: train structure into a neural network by configuring which components update on different tasks. We call it "gradient routing."

1) AIs are trained as black boxes, making it hard to understand or control their behavior. This is bad for safety! But what is an alternative? Our idea: train structure into a neural network by configuring which components update on different tasks. We call it "gradient routing."
Alex Turner (@turn_trout) 's Twitter Profile Photo

Thought real machine unlearning was impossible? We show that distilling a conventionally โ€œunlearnedโ€ model creates a model resistant to relearning attacks. ๐ƒ๐ข๐ฌ๐ญ๐ข๐ฅ๐ฅ๐š๐ญ๐ข๐จ๐ง ๐ฆ๐š๐ค๐ž๐ฌ ๐ฎ๐ง๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ซ๐ž๐š๐ฅ.

Thought real machine unlearning was impossible? We show that distilling a conventionally โ€œunlearnedโ€ model creates a model resistant to relearning attacks. ๐ƒ๐ข๐ฌ๐ญ๐ข๐ฅ๐ฅ๐š๐ญ๐ข๐จ๐ง ๐ฆ๐š๐ค๐ž๐ฌ ๐ฎ๐ง๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ซ๐ž๐š๐ฅ.
Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. ๐Ÿงต

New paper & surprising result.
LLMs transmit traits to other models via hidden signals in data.
Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. ๐Ÿงต