Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺 (@blazejmanczak) Twitter Tweets • TwiCopy

Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺

@blazejmanczak

+ Follow

AI Research @ Dynamo AI (prev Qualcomm AI) 🧐 🤖 also into science of peak human performance and endurance sports

ID: 414160604

linkhttps://bmanczak.github.io/about/ calendar_today16-11-2011 18:15:59

147 Tweet

68 Followers

186 Following

Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺

@blazejmanczak

7 years ago

#MastercardGrazWOŚP

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺

@blazejmanczak

6 years ago

Hej @pawelorzech, jak zaczarować żeby After Dark pojawił się w Overcast?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺

@blazejmanczak

6 years ago

Jak co roku - #MastercardGrazWOŚP!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

CodeIt Self-Improving Language Models with Prioritized Hindsight Replay paper page: huggingface.co/papers/2402.04… Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly

thumb_up_off_alt168

chat_bubble_outline1

repeat28

shareShare

Dmytro Mishkin 🇺🇦

@ducha_aiki

2 years ago

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay Natasha Butt, Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺, Auke Wiggers, Corrado Rainone, David Zhang, Michaël Defferrard, Taco Cohen tl;dr: sample a program, try it, add to the replay pool. New sota on ARC arxiv.org/abs/2402.04858…

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

<a href="/NatashaEve4/">Natasha Butt</a>, <a href="/blazejmanczak/">Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺</a>, <a href="/aukejw/">Auke Wiggers</a>, Corrado Rainone, David Zhang, <a href="/m_deff/">Michaël Defferrard</a>, <a href="/TacoCohen/">Taco Cohen</a>

tl;dr: sample a program, try it, add to the replay pool.
New sota on ARC
arxiv.org/abs/2402.04858…

thumb_up_off_alt35

chat_bubble_outline1

repeat11

shareShare

Sergey Levine

@svlevine

2 years ago

Can we design sample-efficient off-policy RL algorithms for LLMs to master multi-turn tasks? In our new work, we introduce ArCHer, a hierarchical actor-critic algorithm that improves massively in sample efficiency over PPO for multi-turn tasks: yifeizhou02.github.io/archer.io/ Thread 👇

thumb_up_off_alt234

chat_bubble_outline2

repeat40

shareShare

Natasha Butt

@natashaeve4

a year ago

Excited to share that our paper “CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay” was accepted into ICML! Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺 Auke Wiggers Corrado Rainone David Zhang Michaël Defferrard Taco Cohen 1/5

Excited to share that our paper “CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay” was accepted into ICML!
<a href="/blazejmanczak/">Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺</a> <a href="/aukejw/">Auke Wiggers</a> Corrado Rainone <a href="/davwzha/">David Zhang</a> <a href="/m_deff/">Michaël Defferrard</a> <a href="/TacoCohen/">Taco Cohen</a> 1/5

thumb_up_off_alt89

chat_bubble_outline1

repeat15

shareShare

Qualcomm Research & Technologies

@qcomresearch

a year ago

Congratulations to the #AI Research team for having the paper "CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay" accepted at #ICML2024. Discover the future of LLMs: arxiv.org/abs/2402.04858 Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺 Auke Wiggers David Zhang Michaël Defferrard Taco Cohen Natasha Butt

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

Natasha Butt

@natashaeve4

a year ago

Come see our poster #715 on CodeIt today at #ICML2024 13.30-15.00 Halle C. We approach ARC by self-improving LLMs with prioritized hindsight replay. Blaze(j) Manczak 🇵🇱🇱🇺🇪🇺 Auke Wiggers Corrado Rainone David Zhang Michaël Defferrard Taco Cohen