Kunal Jha (@kjha02) Twitter Tweets • TwiCopy

So excited to announce our work was accepted as a Spotlight paper to ICML Conference !!! I'm looking forward to presenting our work there this summer and CogSci Society! Big thank you again to my collaborators Wilka Carvalho Yancheng Liang Simon Shaolei Du Max Kleiman-Weiner Natasha Jaques

thumb_up_off_alt68

chat_bubble_outline3

repeat10

shareShare

Think PII scrubbing ensures privacy? 🤔Think again‼️ In our paper, for the first time on unstructured text, we show that you can re-identify over 70% of private information *after* scrubbing! It’s time to move beyond surface-level anonymization. #Privacy #NLProc 🔗🧵

thumb_up_off_alt50

chat_bubble_outline2

repeat19

shareShare

Marlos C. Machado

@marloscmachado

2 months ago

📢 I'm very excited to release AgarCL, a new evaluation platform for research in continual reinforcement learning‼️ Repo: github.com/machado-resear… Website: agarcl.github.io Preprint: arxiv.org/abs/2505.18347 Details below 👇

thumb_up_off_alt152

chat_bubble_outline5

repeat20

shareShare

Kunal Jha

@kjha02

2 months ago

Fun paper! ...but...data leakage in Qwen orrrrrr something else?

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Max Kleiman-Weiner

@maxhkw

a month ago

LLMs learn beliefs and values from human data, influence our opinions, and then reabsorb those influenced beliefs, feeding them back to users again and again. We call this the "Lock-In Hypothesis" and develop theory, simulations, and empirics to test it in our latest ICML paper!

thumb_up_off_alt55

chat_bubble_outline2

repeat11

shareShare

Jacqueline He

@jcqln_h

a month ago

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

thumb_up_off_alt43

chat_bubble_outline1

repeat18

shareShare

Mickel Liu

@mickel_liu

a month ago

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat 🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

thumb_up_off_alt96

chat_bubble_outline5

repeat22

shareShare

Kevin Ellis

@ellisk_kellis

a month ago

New paper: World models + Program synthesis by Wasu Top Piriyakulkij 1. World modeling on-the-fly by synthesizing programs w/ 4000+ lines of code 2. Learns new environments from minutes of experience 3. Positive score on Montezuma's Revenge 4. Compositional generalization to new environments

thumb_up_off_alt556

chat_bubble_outline14

repeat100

shareShare

Kunal Jha

Gate.io

Kunal Jha

Rui Xin

Marlos C. Machado

Kunal Jha

Max Kleiman-Weiner

Jacqueline He

Mickel Liu

Kevin Ellis