Erfan Miahi (@erfan_mhi) Twitter Tweets • TwiCopy

Erfan Miahi

@erfan_mhi

+ Follow

Ex-researcher at @rlai_lab; Collaborated with people from @googledeepmind; doing mostly rl reasoning!

Doing #parkour & #reading/#writing books in my spare time

ID: 843815099462828032

linkhttps://www.linkedin.com/in/erfan-miahi-8637a1130/ calendar_today20-03-2017 13:22:51

1,1K Tweet

497 Followers

986 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Demis Hassabis, James Manyika, and I wrote up a (lengthy and illustrated!) overview of the AI research work and advances across Google in 2024. It's a summary of the work of many across Google, covering Gemini advances, Gemma, NotebookLM, generative image and video models like

thumb_up_off_alt383

chat_bubble_outline16

repeat51

shareShare

Erfan Miahi

@erfan_mhi

6 months ago

I don't understand why people say RLHF is a contextual bandit problem. Of course, the exploration is limited and the RL problem is badly formulated. But still, you have to solve the temporal credit assignment problem (updating all tokens) which is not part of c-bandit.

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

Erfan Miahi

@erfan_mhi

6 months ago

Reinforcement learning once shook world politics, especially in China, with AlphaGo, and now ~9 years later, DeepSeek R1 is doing it again on a much larger scale. RL IS THE FUTURE, as I always believed. It's the ultimate chess move in the game of intelligence. #DeepSeek #AI #rl

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Aran Komatsuzaki

@arankomatsuzaki

6 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Shows that: - RL generalizes in rule-based envs, esp. when trained with an outcome-based reward - SFT tends to memorize the training data and struggles to generalize OOD

thumb_up_off_alt928

chat_bubble_outline11

repeat150

shareShare

Erfan Miahi

@erfan_mhi

6 months ago

I just gave one of the hardest interview questions I had a year ago to OpenAI o3-mini-high and it solved it. Damn.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Erfan Miahi

@erfan_mhi

5 months ago

The head of my MSc lab just won the Turing Award 🎉

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Qwen

@alibaba_qwen

5 months ago

Today, we release QwQ-32B, our new reasoning model with only 32 billion parameters that rivals cutting-edge reasoning model, e.g., DeepSeek-R1. Blog: qwenlm.github.io/blog/qwq-32b HF: huggingface.co/Qwen/QwQ-32B ModelScope: modelscope.cn/models/Qwen/Qw… Demo: huggingface.co/spaces/Qwen/Qw… Qwen Chat:

thumb_up_off_alt9,9K

chat_bubble_outline490

repeat1,1K

shareShare

Erfan Miahi

@erfan_mhi

4 months ago

If reinforcement learning models CAN cheat their way to maximizing the return, they WILL cheat!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Richard Sutton

@richardssutton

4 months ago

David Silver really hits it out of the park in this podcast. The paper "Welcome to the Era of Experience" is here: goo.gle/3EiRKIH.

thumb_up_off_alt985

chat_bubble_outline19

repeat178

shareShare

Erfan Miahi

@erfan_mhi

3 months ago

I’m thinking along the same lines these days. If anybody wants to collaborate/brainstorm on this let me know.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

λux

@novasarc01

2 months ago

the mit 6.S184 lectures on flow matching and diffusion are really helpful for those who want to start with flow matching and in depth intuition behind it

thumb_up_off_alt902

chat_bubble_outline9

repeat63

shareShare

Erfan Miahi

@erfan_mhi

2 months ago

It’s crazy how the demand for training coding models with RL has exploded in just a few months. People from finance to IT are literally throwing 💰 at me! Everybody wants their own specialized coding model now. Wild compared to just a few months ago.

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Erfan Miahi

Gate.io

Jeff Dean

Erfan Miahi

Erfan Miahi

Aran Komatsuzaki

Erfan Miahi

Erfan Miahi

Qwen

Erfan Miahi

Richard Sutton

Erfan Miahi

λux

Erfan Miahi