Michał Bortkiewicz @ICLR (@m_bortkiewicz) 's Twitter Profile
Michał Bortkiewicz @ICLR

@m_bortkiewicz

PhD at Warsaw University of Technology
Working on RL and Continual Learning

ID: 1357769711963021313

linkhttps://michalbortkiewicz.github.io/ calendar_today05-02-2021 19:15:21

81 Tweet

134 Followers

390 Following

Piotr Sankowski (@piotrsankowski) 's Twitter Profile Photo

Mixture of Experts (MoE) one of the solutions used by DeepSeek. In MoE LLMs only part of the model parameters is activated for each token, what allows to save the cost considerably while training. However, the drawback of MoE is that the size of the model becomes bigger, thus

Mixture of Experts (MoE) one of the solutions used by DeepSeek. In MoE LLMs only part of the model parameters is activated for each token, what allows to save the cost considerably while training. However, the drawback of MoE is that the size of the model becomes bigger, thus
Bartosz Cywiński (@bartoszcyw) 's Twitter Profile Photo

🔥 New ICLR 2025 Paper! It would be cool to control the content of text generated by diffusion models with less than 1% of parameters, right? And how about doing it across diverse architectures and within various applications? 🚀 🫡 Together with Lukasz Staniszewski, we show how: 🧵 1/

Kevin Wang (@kevin_wang3290) 's Twitter Profile Photo

1/ While most RL methods use shallow MLPs (~2–5 layers), we show that scaling up to 1000-layers for contrastive RL (CRL) can significantly boost performance, ranging from doubling performance to 50x on a diverse suite of robotic tasks. Webpage+Paper+Code: wang-kevin3290.github.io/scaling-crl/

Michał Bortkiewicz @ICLR (@m_bortkiewicz) 's Twitter Profile Photo

🚨Scaling RL Most RL methods’ performance saturate at ~5 layers. In this work led by Kevin Wang, we crack the right configuration for scaling Contrastive RL and go beyond 1000 layers NNs! Deep NNs unlock emergent behaviors and other cool properties. Check out Kevin’s thread!

Ben Eysenbach (@ben_eysenbach) 's Twitter Profile Photo

tldr: increase the depth of your RL networks by several orders of magnitude. Our new paper shows that very very deep networks are surprisingly useful for RL, if you use resnets, layer norm, and self-supervised RL! Paper, code, videos: wang-kevin3290.github.io/scaling-crl/

Piotr Sankowski (@piotrsankowski) 's Twitter Profile Photo

Instytut Ideas może zacząć działać - wczoraj dokonany został wpis do KRSu. Jednakże wśród dobrych informacji są też takie dużo bardziej przejmujące o chorobie jednego z liderów zespołów, który miał niedługo zacząć pracę. Łukasz Kuciński walczy z glejakiem i szuka wsparcia na:

Piotr Miłoś (@piotrrmilos) 's Twitter Profile Photo

My good friend has an ongoing fight with cancer. A great father and husband for his family. An excellent co-author for me and many other ML folks. Please support and share! (link in the comment!)

My good friend has an ongoing fight with cancer.

A great father and husband for his family. An excellent co-author for me and many other ML folks.

Please support and share! (link in the comment!)
Tomasz Trzcinski (@tomasztrzcinsk1) 's Twitter Profile Photo

Lukasz Łukasz Kuciński is not only an excellent researcher, but a truely great person. Kind, thoughtful and wise. And all that should be enough to support him in his fight against cancer. But he is also a father and a husband with a loving family worth fighting for. Support him 🙏

Michał Bortkiewicz @ICLR (@m_bortkiewicz) 's Twitter Profile Photo

Excited to present JaxGCRL at ICLR 2025 (spotlight): 📍Hall 3 + Hall 2B, Poster #422 🗓️Friday, April 25 🕒3:00 PM – 5:00 PM I'm also happy to grab a coffee and chat about anything related to RL, robotics, or continual learning!

Alex Lewandowski (@axlewandowski) 's Twitter Profile Photo

I will be presenting two posters at ICLR that outlines an optimization perspective on loss of plasticity. Come check them out on Thursday and Friday @ 10am. Also, feel free to reach out to chat about continual, meta and/or reinforcement learning.

I will be presenting two posters at ICLR that outlines an optimization perspective on loss of plasticity. Come check them out on Thursday and Friday @ 10am.

Also, feel free to reach out to chat about continual, meta and/or reinforcement learning.
Andrew Zhao (@andrewz45732491) 's Twitter Profile Photo

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains.
🧵 1/
Bartosz Cywiński (@bartoszcyw) 's Twitter Profile Photo

New paper: Deceptive LLMs may keep secrets from their operators. Can we elicit this latent knowledge? Maybe! Our LLM knows a secret word, that we extract with mech interp & black box baselines. We open source our model, how much better can you do? w/Emil Ryd Senthooran Rajamanoharan Neel Nanda

New paper: Deceptive LLMs may keep secrets from their operators. Can we elicit this latent knowledge? Maybe!

Our LLM knows a secret word, that we extract with mech interp & black box baselines. We open source our model, how much better can you do?
w/<a href="/emilaryd/">Emil Ryd</a> <a href="/sen_r/">Senthooran Rajamanoharan</a> <a href="/NeelNanda5/">Neel Nanda</a>
Michal Nauman (@mic_nau) 's Twitter Profile Photo

We wondered if off-policy RL could transfer to real robots on-par with on-policy PPO. Turns out it works surprisingly well! We also find that, like on-policy methods, off-policy can leverage massively parallel simulation for even better performance 🤖

Kevin Frans (@kvfrans) 's Twitter Profile Photo

Stare at policy improvement and diffusion guidance, and you may notice a suspicious similarity... We lay out an equivalence between the two, formalizing a simple technique (CFGRL) to improve performance across-the-board when training diffusion policies. arxiv.org/abs/2505.23458

Stare at policy improvement and diffusion guidance, and you may notice a suspicious similarity...

We lay out an equivalence between the two, formalizing a simple technique (CFGRL) to improve performance across-the-board when training diffusion policies.

arxiv.org/abs/2505.23458
Raj Ghugare (@ghugareraj) 's Twitter Profile Photo

Normalizing Flows (NFs) check all the boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models. Are NFs

Normalizing Flows (NFs) check all the boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models.

Are NFs
Jon Richens (@jonathanrichens) 's Twitter Profile Photo

Are world models necessary to achieve human-level agents, or is there a model-free short-cut? Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵

Are world models necessary to achieve human-level agents, or is there a model-free short-cut?
Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
Chongyi Zheng (@chongyiz1) 's Twitter Profile Photo

1/ How should RL agents prepare to solve new tasks? While prior methods often learn a model that predicts the immediate next observation, we build a model that predicts many steps into the future, conditioning on different user intentions: chongyi-zheng.github.io/infom.

Jan Dubiński (@jan_dubinski_) 's Twitter Profile Photo

🚨We’re thrilled to present our paper “CDI: Copyrighted Data Identification in #DiffusionModels” at #CVPR2025 in Nashville! 🎸❗️ "Was this diffusion model trained on my dataset?" Learn how to find out: 📍 Poster #276 🗓️ Saturday, June 14 🕒 3:00 – 5:00 PM PDT

🚨We’re thrilled to present our paper “CDI: Copyrighted Data Identification in #DiffusionModels” at #CVPR2025 in Nashville! 🎸❗️

"Was this diffusion model trained on my dataset?"
Learn how to find out:
📍 Poster #276
🗓️ Saturday, June 14
🕒 3:00 – 5:00 PM PDT