Michael Matthews @ ICLR 2025 (@mitrma) Twitter Tweets • TwiCopy

Michael Beukman

@mcbeukman

9 months ago

I'll also be at NeurIPS, keen to chat about UED, SFL, Kinetix, or anything in open-ended RL :)

thumb_up_off_alt28

chat_bubble_outline2

repeat4

shareShare

Couldn't agree more. "UK Research and Innovation funding in the UK fell under the previous government from 6,835 in 2018-19 to 4,900 in 2022-23". To give a concrete example (with my @UCLCS professor hat on): 4 out of 7 UCL DARK PhD students were funded by the Centre for Doctoral

thumb_up_off_alt157

chat_bubble_outline7

repeat35

shareShare

Martin Klissarov

@martinklissarov

7 months ago

Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for AI-assisted skill design that produces highly capable and steerable hierarchical agents. To the best of our knowledge, it is the first

thumb_up_off_alt203

chat_bubble_outline6

repeat53

shareShare

Michael Matthews @ ICLR 2025

@mitrma

7 months ago

Congratulations to antoine dedieu Joe Ortiz Kevin Patrick Murphy and the team for setting a new SOTA on Craftax-1M and Craftax-Classic-1M! 🎉

thumb_up_off_alt17

chat_bubble_outline2

repeat0

shareShare

Michael Matthews @ ICLR 2025

@mitrma

7 months ago

Kinetix has been accepted at ICLR as an Oral! See you in Singapore 🇸🇬

thumb_up_off_alt49

chat_bubble_outline2

repeat3

shareShare

Machine Learning Street Talk

@mlstreettalk

7 months ago

Jakob Foerster Jakob Foerster at University of Oxford arguing that the AI community needs to avoid being goodharted by benchmarks.

thumb_up_off_alt118

chat_bubble_outline4

repeat16

shareShare

Mikayel Samvelyan

@_samvelyan

7 months ago

⚔️ MiniHack Updates! ⚔️ 1️⃣ MiniHack 1.0.0 is here! Following popular demand, it now supports the new Gymnasium API and is built on NLE 1.1.0. Huge thanks to @Stephen_Oman (maintainer of The NetHack Learning Environment ) for his outstanding contribution! 🙌

thumb_up_off_alt66

chat_bubble_outline3

repeat13

shareShare

alphaXiv

@askalphaxiv

7 months ago

1997: Deep Blue defeats Kasparov at chess 2016: AlphaGo masters the game of Go 2025: Stanford researchers crack Among Us Trending on alphaXiv 📈 Remarkable new work trains LLMs to master strategic social deduction through multi-agent RL, doubling win rates over standard RL.

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat250

shareShare

Leor Cohen

@liorcohen5s

7 months ago

Introducing M³: A 𝗠odular 𝗪orld 𝗠odel over streams of tokens for sample-efficient RL 🌍🤖 M³ achieves state-of-the-art performance for planning-free world models on Atari-100K 🕹️, DMC 🦾, and Craftax-1M! 🚀 🧵1/8

thumb_up_off_alt32

chat_bubble_outline2

repeat6

shareShare

Michael Matthews @ ICLR 2025

@mitrma

7 months ago

Kinetix was featured on Computerphile!

thumb_up_off_alt24

chat_bubble_outline2

repeat2

shareShare

Andrei Lupu

@_andreilupu

6 months ago

Did you know that \textcolor{white} text is still visible to LLMs? Anyway, don't use LLMs to write your reviews. Your co-authors will thank you.

$Did you know that \textcolor{white} text is still visible to LLMs? Anyway, don't use LLMs to write your reviews. Your co-authors will thank you.$

thumb_up_off_alt147

chat_bubble_outline4

repeat6

shareShare

Michael Beukman

@mcbeukman

5 months ago

I'll be attending ICLR next week to present Kinetix with Michael Matthews. Would love to chat about anything UED / Open-Ended RL / QD related, or interesting research in general :)

thumb_up_off_alt34

chat_bubble_outline0

repeat1

shareShare

Michael Matthews @ ICLR 2025

@mitrma

5 months ago

I'll be in Singapore next week to present Kinetix as an Oral along with Michael Beukman. Reach out if you'd like to chat! 🇸🇬

thumb_up_off_alt26

chat_bubble_outline0

repeat2

shareShare

Matthew Jackson

@jacksonmattt

5 months ago

🌹 Today we're releasing Unifloral, our new library for Offline Reinforcement Learning! We make research easy: ⚛️ Single-file 🤏 Minimal ⚡️ End-to-end Jax Best of all, we unify prior methods into one algorithm - a single hyperparameter space for research! ⤵️

thumb_up_off_alt138

chat_bubble_outline5

repeat35

shareShare

Michael Matthews @ ICLR 2025

@mitrma

5 months ago

We are presenting Kinetix today! Oral - 11:30am Peridot Room 5F Poster - 3pm Hall 3+2B 377

thumb_up_off_alt22

chat_bubble_outline1

repeat3

shareShare

Jakob Foerster

@j_foerst

3 months ago

Hello World: My team at FAIR / AI at Meta (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1FAI…

thumb_up_off_alt111

chat_bubble_outline3

repeat23

shareShare

Seohong Park

@seohong_park

3 months ago

Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓

thumb_up_off_alt880

chat_bubble_outline9

repeat137

shareShare

Mikael Henaff

@henaffmikael

3 months ago

A couple bits of news: 1. Happy to share my first (human) NetHack ascension-next step is RL agents :) 2. I wrote a post discussing some The NetHack Learning Environment challenges & how they map to open problems in RL & agentic AI. Still the best RL benchmark imo. mikaelhenaff.substack.com/p/first-nethac…

A couple bits of news:

1. Happy to share my first (human) NetHack ascension-next step is RL agents :)

2. I wrote a post discussing some <a href="/NetHack_LE/">The NetHack Learning Environment</a> challenges & how they map to open problems in RL & agentic AI. Still the best RL benchmark imo.

mikaelhenaff.substack.com/p/first-nethac…

thumb_up_off_alt49

chat_bubble_outline3

repeat10

shareShare

Samuel Garcin

@samuelgarcin

3 months ago

You work on RL from pixels, and you're tired to wait 10 hours for a DMC run to finish? Or up to 100 hours, if you add video distractors? Well, we got you covered : PixelBrax can run your continuous control experiments from pixels in < 1 hr! Come chat with Trevor McInroe and I at

thumb_up_off_alt16

chat_bubble_outline1

repeat3

shareShare

Martin Klissarov

@martinklissarov

2 months ago

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵

thumb_up_off_alt151

chat_bubble_outline4

repeat35

shareShare

Michael Matthews @ ICLR 2025

Michael Beukman

Tim Rocktäschel

Martin Klissarov

Michael Matthews @ ICLR 2025

Michael Matthews @ ICLR 2025

Machine Learning Street Talk

Mikayel Samvelyan

alphaXiv

Leor Cohen

Michael Matthews @ ICLR 2025

Andrei Lupu

Michael Beukman

Michael Matthews @ ICLR 2025

Matthew Jackson

Michael Matthews @ ICLR 2025

Jakob Foerster

Seohong Park

Mikael Henaff

Samuel Garcin

Martin Klissarov