
Daniela Gottesman
@dhgottesman
ID: 1473492874352234497
22-12-2021 03:17:46
13 Tweet
27 Followers
18 Following

Do you have a "tell" when you are about to lie? We find that LLMs have “tells” in their internal representations which allow estimating how knowledgeable a model is about an entity 𝘣𝘦𝘧𝘰𝘳𝘦 it generates even a single token. Paper: arxiv.org/abs/2406.12673… 🧵 Daniela Gottesman



In Hebrew, we have an idiom "one in the mouth, one in the heart" which means that there is a gap between what someone says versus what they think. Daniela Gottesman's recent work showed with a simple probe (KEEN) that this behavior often happens in LLMs -->
