Michael Matthews @ ICLR 2025 (@mitrma) 's Twitter Profile
Michael Matthews @ ICLR 2025

@mitrma

PhD student @FLAIR_Ox working on RL in open-ended environments

ID: 429236521

linkhttps://www.mtmatthews.com/ calendar_today05-12-2011 18:40:04

128 Tweet

778 Followers

318 Following

Tim Rocktäschel (@_rockt) 's Twitter Profile Photo

Couldn't agree more. "UK Research and Innovation funding in the UK fell under the previous government from 6,835 in 2018-19 to 4,900 in 2022-23". To give a concrete example (with my @UCLCS professor hat on): 4 out of 7 UCL DARK PhD students were funded by the Centre for Doctoral

Martin Klissarov (@martinklissarov) 's Twitter Profile Photo

Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for AI-assisted skill design that produces highly capable and steerable hierarchical agents. To the best of our knowledge, it is the first

Mikayel Samvelyan (@_samvelyan) 's Twitter Profile Photo

⚔️ MiniHack Updates! ⚔️ 1️⃣ MiniHack 1.0.0 is here! Following popular demand, it now supports the new Gymnasium API and is built on NLE 1.1.0. Huge thanks to @Stephen_Oman (maintainer of The NetHack Learning Environment ) for his outstanding contribution! 🙌

alphaXiv (@askalphaxiv) 's Twitter Profile Photo

1997: Deep Blue defeats Kasparov at chess 2016: AlphaGo masters the game of Go 2025: Stanford researchers crack Among Us Trending on alphaXiv 📈 Remarkable new work trains LLMs to master strategic social deduction through multi-agent RL, doubling win rates over standard RL.

Leor Cohen (@liorcohen5s) 's Twitter Profile Photo

Introducing M³: A 𝗠odular 𝗪orld 𝗠odel over streams of tokens for sample-efficient RL 🌍🤖 M³ achieves state-of-the-art performance for planning-free world models on Atari-100K 🕹️, DMC 🦾, and Craftax-1M! 🚀 🧵1/8

Andrei Lupu (@_andreilupu) 's Twitter Profile Photo

Did you know that \textcolor{white} text is still visible to LLMs? Anyway, don't use LLMs to write your reviews. Your co-authors will thank you.

Did you know that \textcolor{white} text is still visible to LLMs?

Anyway, don't use LLMs to write your reviews. Your co-authors will thank you.
Michael Beukman (@mcbeukman) 's Twitter Profile Photo

I'll be attending ICLR next week to present Kinetix with Michael Matthews. Would love to chat about anything UED / Open-Ended RL / QD related, or interesting research in general :)

Matthew Jackson (@jacksonmattt) 's Twitter Profile Photo

🌹 Today we're releasing Unifloral, our new library for Offline Reinforcement Learning! We make research easy: ⚛️ Single-file 🤏 Minimal ⚡️ End-to-end Jax Best of all, we unify prior methods into one algorithm - a single hyperparameter space for research! ⤵️

Jakob Foerster (@j_foerst) 's Twitter Profile Photo

Hello World: My team at FAIR / AI at Meta (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1FAI…

Seohong Park (@seohong_park) 's Twitter Profile Photo

Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓

Mikael Henaff (@henaffmikael) 's Twitter Profile Photo

A couple bits of news: 1. Happy to share my first (human) NetHack ascension-next step is RL agents :) 2. I wrote a post discussing some The NetHack Learning Environment challenges & how they map to open problems in RL & agentic AI. Still the best RL benchmark imo. mikaelhenaff.substack.com/p/first-nethac…

A couple bits of news:

1. Happy to share my first (human) NetHack ascension-next step is RL agents :) 

2. I wrote a post discussing some <a href="/NetHack_LE/">The NetHack Learning Environment</a>  challenges &amp; how they map to open problems in RL &amp; agentic AI. Still the best RL benchmark imo.  

mikaelhenaff.substack.com/p/first-nethac…
Samuel Garcin (@samuelgarcin) 's Twitter Profile Photo

You work on RL from pixels, and you're tired to wait 10 hours for a DMC run to finish? Or up to 100 hours, if you add video distractors? Well, we got you covered : PixelBrax can run your continuous control experiments from pixels in < 1 hr! Come chat with Trevor McInroe and I at

Martin Klissarov (@martinklissarov) 's Twitter Profile Photo

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵