
Kristian Georgiev
@kris_georgiev1
Research Scientist @OpenAI | on leave from PhD at @MIT
ID: 1252659231452393472
http://kristian-georgiev.github.io 21-04-2020 18:03:36
66 Tweet
428 Followers
569 Following

In ML, we train on biased (huge) datasets ➡️ models encode spurious corrs and fail on minority groups. Can we scalably remove "bad" data? w/ Saachi Jain Kimia Hamidieh Kristian Georgiev Andrew Ilyas Marzyeh we propose D3M, a method for exactly this: gradientscience.org/d3m/




At #ICML2024 ? Our tutorial "Data Attribution at Scale" will be to tomorrow at 9:30 AM CEST in Hall A1! I will not be able to make it (but will arrive later that day), but my awesome students Andrew Ilyas Sam Park Logan Engstrom will carry the torch :)

Attending #ICML2024? Check out our work on decomposing predictions and editing model behavior via targeted interventions to model internals! Poster: #2513, Hall C 4-9, 1:30p (Tue) Paper: arxiv.org/abs/2404.11534 w/ Harshay Shah Andrew Ilyas



Thanks to all who attended our tutorial "Data Attribution at Scale" at ICML (w/ Sam Park Logan Engstrom Kristian Georgiev Aleksander Madry)! We're really excited to see the response to this emerging topic. Slides, notes, ICML video: ml-data-tutorial.org Public recording soon!


The ATTRIB workshop is back @ NeurIPS 2024! We welcome papers connecting model behavior to data, algorithms, parameters, scale, or anything else. Submit by Sep 18! More info: attrib-workshop.cc Co-organizers: Tolga Bolukbasi Logan Engstrom Sadhika Malladi Elisa Nguyen Sam Park



Machine unlearning ("removing" training data from a trained ML model) is a hard, important problem. Datamodel Matching (DMM): a new unlearning paradigm with strong empirical performance! w/ Kristian Georgiev Roy Rinberg Sam Park Shivam Garg Aleksander Madry Seth Neel (1/4)



Excited to see this new paper published in Transactions on Machine Learning Research! I study the problem of simultaneously estimating many private regressions that share the same set of covariates X but have l different outcomes Y. For example, X might be a persons genomic data, and Y's might correspond to


Andrej Karpathy The hottest new programming language is vibes




Want state-of-the-art data curation, data poisoning & more? Just do gradient descent! w/ Andrew Ilyas Ben Chen Axel Feldmann Billy Moses Aleksander Madry: we show how to optimize final model loss wrt any continuous variable. Key idea: Metagradients (grads through model training)


