Michał Bortkiewicz @ICLR (@m_bortkiewicz) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Mixture of Experts (MoE) one of the solutions used by DeepSeek. In MoE LLMs only part of the model parameters is activated for each token, what allows to save the cost considerably while training. However, the drawback of MoE is that the size of the model becomes bigger, thus

thumb_up_off_alt74

chat_bubble_outline3

repeat10

shareShare

Bartosz Cywiński

@bartoszcyw

5 months ago

🔥 New ICLR 2025 Paper! It would be cool to control the content of text generated by diffusion models with less than 1% of parameters, right? And how about doing it across diverse architectures and within various applications? 🚀 🫡 Together with Lukasz Staniszewski, we show how: 🧵 1/

thumb_up_off_alt127

chat_bubble_outline2

repeat27

shareShare

Kevin Wang

@kevin_wang3290

4 months ago

1/ While most RL methods use shallow MLPs (~2–5 layers), we show that scaling up to 1000-layers for contrastive RL (CRL) can significantly boost performance, ranging from doubling performance to 50x on a diverse suite of robotic tasks. Webpage+Paper+Code: wang-kevin3290.github.io/scaling-crl/

thumb_up_off_alt388

chat_bubble_outline8

repeat63

shareShare

Ishaan Javali

@ij_apps

4 months ago

Check out this new paper by Kevin Wang, myself, Michał Bortkiewicz, Tomasz Trzcinski, and Ben Eysenbach! We show a method for scaling Contrastive RL, leading to significant performance improvements.

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Michał Bortkiewicz @ICLR

@m_bortkiewicz

4 months ago

🚨Scaling RL Most RL methods’ performance saturate at ~5 layers. In this work led by Kevin Wang, we crack the right configuration for scaling Contrastive RL and go beyond 1000 layers NNs! Deep NNs unlock emergent behaviors and other cool properties. Check out Kevin’s thread!

thumb_up_off_alt24

chat_bubble_outline0

repeat7

shareShare

Ben Eysenbach

@ben_eysenbach

4 months ago

tldr: increase the depth of your RL networks by several orders of magnitude. Our new paper shows that very very deep networks are surprisingly useful for RL, if you use resnets, layer norm, and self-supervised RL! Paper, code, videos: wang-kevin3290.github.io/scaling-crl/

thumb_up_off_alt226

chat_bubble_outline8

repeat32

shareShare

Piotr Sankowski

@piotrsankowski

3 months ago

Instytut Ideas może zacząć działać - wczoraj dokonany został wpis do KRSu. Jednakże wśród dobrych informacji są też takie dużo bardziej przejmujące o chorobie jednego z liderów zespołów, który miał niedługo zacząć pracę. Łukasz Kuciński walczy z glejakiem i szuka wsparcia na:

thumb_up_off_alt324

chat_bubble_outline5

repeat85

shareShare

Piotr Miłoś

@piotrrmilos

3 months ago

My good friend has an ongoing fight with cancer. A great father and husband for his family. An excellent co-author for me and many other ML folks. Please support and share! (link in the comment!)

thumb_up_off_alt34

chat_bubble_outline1

repeat13

shareShare

Tomasz Trzcinski

@tomasztrzcinsk1

3 months ago

Lukasz Łukasz Kuciński is not only an excellent researcher, but a truely great person. Kind, thoughtful and wise. And all that should be enough to support him in his fight against cancer. But he is also a father and a husband with a loving family worth fighting for. Support him 🙏

thumb_up_off_alt15

chat_bubble_outline0

repeat3

shareShare

Michał Bortkiewicz @ICLR

@m_bortkiewicz

3 months ago

Excited to present JaxGCRL at ICLR 2025 (spotlight): 📍Hall 3 + Hall 2B, Poster #422 🗓️Friday, April 25 🕒3:00 PM – 5:00 PM I'm also happy to grab a coffee and chat about anything related to RL, robotics, or continual learning!

thumb_up_off_alt20

chat_bubble_outline0

repeat3

shareShare

Alex Lewandowski

@axlewandowski

3 months ago

I will be presenting two posters at ICLR that outlines an optimization perspective on loss of plasticity. Come check them out on Thursday and Friday @ 10am. Also, feel free to reach out to chat about continual, meta and/or reinforcement learning.

thumb_up_off_alt36

chat_bubble_outline3

repeat5

shareShare

Andrew Zhao

@andrewz45732491

3 months ago

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

thumb_up_off_alt1,1K

chat_bubble_outline55

repeat343

shareShare

Bartosz Cywiński

@bartoszcyw

2 months ago

New paper: Deceptive LLMs may keep secrets from their operators. Can we elicit this latent knowledge? Maybe! Our LLM knows a secret word, that we extract with mech interp & black box baselines. We open source our model, how much better can you do? w/Emil Ryd Senthooran Rajamanoharan Neel Nanda

thumb_up_off_alt113

chat_bubble_outline2

repeat18

shareShare

Michal Nauman

@mic_nau

2 months ago

We wondered if off-policy RL could transfer to real robots on-par with on-policy PPO. Turns out it works surprisingly well! We also find that, like on-policy methods, off-policy can leverage massively parallel simulation for even better performance 🤖

thumb_up_off_alt48

chat_bubble_outline0

repeat9

shareShare

Kevin Frans

@kvfrans

2 months ago

Stare at policy improvement and diffusion guidance, and you may notice a suspicious similarity... We lay out an equivalence between the two, formalizing a simple technique (CFGRL) to improve performance across-the-board when training diffusion policies. arxiv.org/abs/2505.23458

thumb_up_off_alt239

chat_bubble_outline8

repeat37

shareShare

Raj Ghugare

@ghugareraj

2 months ago

Normalizing Flows (NFs) check all the boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models. Are NFs

thumb_up_off_alt214

chat_bubble_outline6

repeat18

shareShare

Jon Richens

@jonathanrichens

2 months ago

Are world models necessary to achieve human-level agents, or is there a model-free short-cut? Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵

thumb_up_off_alt1,1K

chat_bubble_outline33

repeat170

shareShare

Chongyi Zheng

@chongyiz1

a month ago

1/ How should RL agents prepare to solve new tasks? While prior methods often learn a model that predicts the immediate next observation, we build a model that predicts many steps into the future, conditioning on different user intentions: chongyi-zheng.github.io/infom.

thumb_up_off_alt92

chat_bubble_outline1

repeat15

shareShare

Jan Dubiński

@jan_dubinski_

a month ago

🚨We’re thrilled to present our paper “CDI: Copyrighted Data Identification in #DiffusionModels” at #CVPR2025 in Nashville! 🎸❗️ "Was this diffusion model trained on my dataset?" Learn how to find out: 📍 Poster #276 🗓️ Saturday, June 14 🕒 3:00 – 5:00 PM PDT

thumb_up_off_alt14

chat_bubble_outline1

repeat6

shareShare