Gintare Karolina Dziugaite (@gkdziugaite) 's Twitter Profile
Gintare Karolina Dziugaite

@gkdziugaite

Sr Research Scientist at Google DeepMind, Toronto. Member, Mila. Adjunct, McGill CS. PhD Machine Learning & MASt Applied Math (Cambridge), BSc Math (Warwick).

ID: 954436574468624384

linkhttps://gkdz.org calendar_today19-01-2018 19:33:06

83 Tweet

3,3K Followers

114 Following

Yu Yang (@yuyang_i) 's Twitter Profile Photo

🎉 Two of my papers have been accepted this week at #ICLR2024 & #AISTATS! Big thanks and congrats to co-authors Xuxi Chen & Eric Gan, mentors Atlas Wang & Gintare Karolina Dziugaite, and especially my advisor Baharan Mirzasoleiman! 🙏 More details on both papers after the ICML deadline!

🎉 Two of my papers have been accepted this week at #ICLR2024 & #AISTATS! 
Big thanks and congrats to co-authors <a href="/xxchenxx_ut/">Xuxi Chen</a> &amp; Eric Gan, mentors Atlas Wang &amp; <a href="/gkdziugaite/">Gintare Karolina Dziugaite</a>, and especially my advisor <a href="/baharanm/">Baharan Mirzasoleiman</a>! 🙏
More details on both papers after the ICML deadline!
Jacob Austin (@jacobaustin132) 's Twitter Profile Photo

We've finally put out a detailed IEEE/ACM paper on Google's multi-year effort to ease the burden of code review with ML. Google engineers now resolve 7.5% of all code review comments with an ML-suggested edit. But the path to that number has been a fun ML and UX journey!

We've finally put out a detailed IEEE/ACM paper on <a href="/Google/">Google</a>'s multi-year effort to ease the burden of code review with ML. Google engineers now resolve 7.5% of all code review comments with an ML-suggested edit. But the path to that number has been a fun ML and UX journey!
Pablo Samuel Castro (@pcastr) 's Twitter Profile Photo

📢Mixtures of Experts unlock parameter scaling for deep RL! Adding MoEs, and in particular Soft MoEs, to value-based deep RL agents results in more parameter-scalable models. Performance keeps increasing as we increase number of experts (green line below)! 1/9

📢Mixtures of Experts unlock parameter scaling for deep RL!

Adding MoEs, and in particular Soft MoEs, to value-based deep RL agents results in more parameter-scalable models.

Performance keeps increasing as we increase number of experts (green line below)!
1/9
Gintare Karolina Dziugaite (@gkdziugaite) 's Twitter Profile Photo

In deep nets, we observe good generalization together with memorization. In this new work, we show that, in stochastic convex optimization, memorization of most of the training data is a necessary feature of optimal learning.

EEML (@eemlcommunity) 's Twitter Profile Photo

DEADLINE March 29: prepare and submit your application for EEML 2024, Novi Sad, Serbia eeml.eu 🇷🇸. Topics: Basics of ML, Multimodal learning, NLP, Advanced DL architectures, Generative models, AI for Science. Check our stellar speakers! Scholarships available! 🎉

DEADLINE March 29: prepare and submit your application for EEML 2024, Novi Sad, Serbia eeml.eu 🇷🇸. Topics: Basics of ML, Multimodal learning, NLP, Advanced DL architectures, Generative models, AI for Science. Check our stellar speakers! Scholarships available! 🎉
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Google presents Mixture-of-Depths Dynamically allocating compute in transformer-based language models Same performance w/ a fraction of the FLOPs per forward pass arxiv.org/abs/2404.02258

Google presents Mixture-of-Depths

Dynamically allocating compute in transformer-based language models

Same performance w/ a fraction of the FLOPs per forward pass

arxiv.org/abs/2404.02258
Gintare Karolina Dziugaite (@gkdziugaite) 's Twitter Profile Photo

How are LLM capabilities affected by pruning? Checkout our ICLR 2026 paper showing that ICL is preserved until high levels of sparsity, in contrast to fact recall which quickly deteriorates. Our analysis reveals which part of the network is more prunable for a given capability.

Tian Jin @ ICLR (@tjingrant) 's Twitter Profile Photo

See u tmrw ICLR 2025 Sess 1 #133! When we down-scale LLMs (e.g.pruning), what happens to their capabilities? We studied complementary skills of memory recall & in-context learning and consistently found that memory recall deteriorates much quicker than ICL when down-scaling.

See u tmrw <a href="/iclr_conf/">ICLR 2025</a> Sess 1 #133!
When we down-scale LLMs (e.g.pruning), what happens to their capabilities? We studied complementary skills of memory recall &amp; in-context learning and consistently found that  memory recall deteriorates much quicker than ICL when down-scaling.
Gintare Karolina Dziugaite (@gkdziugaite) 's Twitter Profile Photo

We've seen memorization in NNets, despite good generalization. But can we generalize without memorizing? Come hear about our best #ICML2024 paper award work on showing that in stochastic convex opt, optimal learners memorize! Talk today at 11.15am, poster at 11.30am.

Ghada Sokar (@g_sokar) 's Twitter Profile Photo

Excited to present our spotlight paper on MoEs in RL today at #ICML2024! Me, Johan S. Obando 👍🏽, Pablo Samuel Castro, and Jesse Farebrother are looking forward to chat with you! Poster #1207 Hall C 4-9 at 1:30-3:00 pm

Gintare Karolina Dziugaite (@gkdziugaite) 's Twitter Profile Photo

#ECMLPKDD 2024 is happening in my beautiful hometown Vilnius this week! There? Come see my keynote on memorization and generalization this evening @ 6pm.

Gintare Karolina Dziugaite (@gkdziugaite) 's Twitter Profile Photo

Excited to share our new mechanistic unlearning method that targets the mechanisms behind factual recall for more robust and effective knowledge removal, making relearning more difficult 💪

Arthur Conmy (@arthurconmy) 's Twitter Profile Photo

We are hiring Applied Interpretability researchers on the GDM Mech Interp Team!🧵 If interpretability is ever going to be useful, we need it to be applied at the frontier. Come work with Neel Nanda, the Google DeepMind AGI Safety team, and me: apply by 28th February as a

Tian Jin @ ICLR (@tjingrant) 's Twitter Profile Photo

Introducing Learned Asynchronous Decoding w/ friends from MIT/Google! LLM responses often have chunks of tokens that are semantically independent. We train LLMs to identify and decode them in parallel, speeding up inference by 1.46x geomean (AlpacaEval) w/ only 1.3% quality loss.

Yani Ioannou @ ICLR 2025 ✈️ (@yanii) 's Twitter Profile Photo

I will travelling to Singapore 🇸🇬 this week for the ICLR 2025 Workshop on Sparsity in LLMs (SLLM) that I'm co-organizing! We have an exciting lineup of invited speakers and panelists including Dan Alistarh, Gintare Karolina Dziugaite, Pavlo Molchanov, Vithu Thangarasa, Yuandong Tian and Amir Yazdan.

Gintare Karolina Dziugaite (@gkdziugaite) 's Twitter Profile Photo

Managed to get through the ICLR registration line? Come to our poster to learn about applying domain adversarial training for unlearning. Starting now @ Hall 3 #506

Sparsity in LLMs Workshop at ICLR 2025 (@sparsellms) 's Twitter Profile Photo

Sparse LLM workshop will run on Sunday with two poster sessions, a mentoring session, 4 spotlight talks, 4 invited talks and a panel session. We'll host an amazing lineup of researchers: Dan Alistarh Vithu Thangarasa Yuandong Tian Amir Yazdan Gintare Karolina Dziugaite Olivia Hsu Pavlo Molchanov Yang Yu

Sparse LLM workshop will run on Sunday with two poster sessions, a mentoring session, 4 spotlight talks, 4 invited talks and a panel session. 

We'll host an amazing lineup of researchers: <a href="/DAlistarh/">Dan Alistarh</a> <a href="/vithursant19/">Vithu Thangarasa</a> <a href="/tydsh/">Yuandong Tian</a> <a href="/ayazdanb/">Amir Yazdan</a> <a href="/gkdziugaite/">Gintare Karolina Dziugaite</a> Olivia Hsu <a href="/PavloMolchanov/">Pavlo Molchanov</a> Yang Yu
Mohammed Adnan (@adnan_ahmad1306) 's Twitter Profile Photo

1/10 🧵 🔍Can weight symmetry provide insights into sparse training and the Lottery Ticket Hypothesis? 🧐We dive deep into this question in our latest paper, "Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry", accepted at #ICML2025

1/10 🧵
🔍Can weight symmetry provide insights into sparse training and the Lottery Ticket Hypothesis?

🧐We dive deep into this question in our latest paper, "Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry", accepted at #ICML2025