
Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺
@blazejmanczak
AI Research @ Dynamo AI (prev Qualcomm AI) 🧐 🤖 also into science of peak human performance and endurance sports
ID: 414160604
https://bmanczak.github.io/about/ 16-11-2011 18:15:59
147 Tweet
68 Followers
186 Following





CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay Natasha Butt, Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺, Auke Wiggers, Corrado Rainone, David Zhang, Michaël Defferrard, Taco Cohen tl;dr: sample a program, try it, add to the replay pool. New sota on ARC arxiv.org/abs/2402.04858…


Can we design sample-efficient off-policy RL algorithms for LLMs to master multi-turn tasks? In our new work, we introduce ArCHer, a hierarchical actor-critic algorithm that improves massively in sample efficiency over PPO for multi-turn tasks: yifeizhou02.github.io/archer.io/ Thread 👇

Excited to share that our paper “CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay” was accepted into ICML! Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺 Auke Wiggers Corrado Rainone David Zhang Michaël Defferrard Taco Cohen 1/5


Congratulations to the #AI Research team for having the paper "CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay" accepted at #ICML2024. Discover the future of LLMs: arxiv.org/abs/2402.04858 Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺 Auke Wiggers David Zhang Michaël Defferrard Taco Cohen Natasha Butt

Come see our poster #715 on CodeIt today at #ICML2024 13.30-15.00 Halle C. We approach ARC by self-improving LLMs with prioritized hindsight replay. Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺 Auke Wiggers Corrado Rainone David Zhang Michaël Defferrard Taco Cohen









Hej Wojtek • jestem na bluskaju, chodźmy stąd @pawelorzech, gdzie w okolicy lotniska Chopina można zjeść szybko dobry obiad? #ktojetenje


