
Sachin Kumar
@shocheen
Assistant Professor at @OhioStateCSE. Hiring Ph.D. students (Fall '25).
Previous: @allen_ai, @UWNLP, @LTICMU. He/Him ๐ณ๏ธโ๐
ID: 267680298
http://shocheen.com 17-03-2011 10:39:58
424 Tweet
1,1K Followers
690 Following



Want to know what training data has been memorized by models like GPT-4? We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models, without requiring access to ๐ โโ๏ธ Model weights ๐ โโ๏ธ Training data ๐ โโ๏ธ Token probabilities ๐งต1/5


๐จ NEW WORKSHOP ALERT ๐จ We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 ICML Conference! ๐ Submissions are open for work on tokenization across all areas of machine learning. ๐ Submission deadline: May 30, 2025 ๐ tokenization-workshop.github.io


In the upcoming weeks, we will announce an exciting line-up of invited talks and panelists. Follow our account Tokenization Workshop (TokShop) @ICML2025 to stay tuned. Join us at TokShop at #ICML2025!




Delighted there will finally be a workshop devoted to tokenization - a critical topic for LLMs and beyond! ๐ Join us for the inaugural edition of TokShop at #ICML2025 ICML Conference in Vancouver this summer! ๐ค


While I'm on X to share my paper, I also have a life update I'll be joining School of Information - UT Austin as an assistant professor starting Fall 2026! Excited for this next chapter, and to keep working on teaching computers to better understand language and humans (+now teaching humans too)



Very excited for a new #ICML2025 position paper accepted as oral w Bodhisattwa Majumder & Tuhin Chakrabarty! ๐ What are the longitudinal harms of AI development? We use economic theories to highlight AIโs intertemporal impacts on livelihoods & its role in deepening labor-market inequality.


