
Sachin Kumar
@shocheen
Assistant Professor at @OhioStateCSE. Hiring Ph.D. students (Fall '25).
Previous: @allen_ai, @UWNLP, @LTICMU. He/Him π³οΈβπ
ID: 267680298
http://shocheen.com 17-03-2011 10:39:58
424 Tweet
1,1K Followers
690 Following



Want to know what training data has been memorized by models like GPT-4? We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models, without requiring access to π ββοΈ Model weights π ββοΈ Training data π ββοΈ Token probabilities π§΅1/5



π¨ NEW WORKSHOP ALERT π¨ We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 ICML Conference! π Submissions are open for work on tokenization across all areas of machine learning. π Submission deadline: May 30, 2025 π tokenization-workshop.github.io


In the upcoming weeks, we will announce an exciting line-up of invited talks and panelists. Follow our account Tokenization Workshop (TokShop) @ICML2025 to stay tuned. Join us at TokShop at #ICML2025!




Delighted there will finally be a workshop devoted to tokenization - a critical topic for LLMs and beyond! π Join us for the inaugural edition of TokShop at #ICML2025 ICML Conference in Vancouver this summer! π€


While I'm on X to share my paper, I also have a life update I'll be joining School of Information - UT Austin as an assistant professor starting Fall 2026! Excited for this next chapter, and to keep working on teaching computers to better understand language and humans (+now teaching humans too)



Very excited for a new #ICML2025 position paper accepted as oral w Bodhisattwa Majumder & Tuhin Chakrabarty! π What are the longitudinal harms of AI development? We use economic theories to highlight AIβs intertemporal impacts on livelihoods & its role in deepening labor-market inequality.


