Soufiane Hayou (@hayou_soufiane) 's Twitter Profile
Soufiane Hayou

@hayou_soufiane

Researcher @SimonsInstitute, UC Berkeley. PhD @oxfordstats and MSc&EngD @Polytechnique. I like to scale up things!

ID: 1277309540313300995

linkhttp://www.soufianehayou.com calendar_today28-06-2020 18:35:13

289 Tweet

791 Followers

275 Following

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

🔥 Great to see LoRA+ getting so much attention! Pro tip: Since LoRA+ gains are (mostly) "orthogonal" to other methods, combining it with different LoRA variants could give even better results 🚀 Also: LoRA+ is now part of HuggingFace (PEFT). #AI #LLMs #LoRA #finetuning

NeurIPS Conference (@neuripsconf) 's Twitter Profile Photo

Due to a high demand for registrations, NeurIPS will be moving towards a randomized lottery system, effective immediately. Authors of accepted conference and workshop papers are still guaranteed registration, but this may change as we release spots to the lottery, so we urge

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

Saying 'LLMs are just next token predictors' is like saying 'A polynomial function is just a sum of monomials (x^k)'. Scale is key - with the right scale, a polynomial function can approximate incredibly complex behaviors. 🧮🤖

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

Said 'wassup' to Gemini 2.0 Flash Thinking and I think I gave it social anxiety... it wrote a whole research paper on how to respond casually 😭

Said 'wassup' to Gemini 2.0 Flash Thinking and I think I gave it social anxiety... it wrote a whole research paper on how to respond casually 😭
Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

People compare AI to past historic breakthroughs 🔄 (industrial revolution, internet, etc), but there's a crucial difference: In previous advancements, humans remained the most intelligent beings. This time, we're creating something that could surpass us 🤖. It's a singularity!⚡️

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

100%! For instance, if you have good understanding of concentration of random variables, you should be able to infer (without a lot of engineering) how to scale init and learning rate with width (µP, Mean-Field, etc), or with depth (Stable ResNet, Depth-µP, etc.)

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

I had a similar experience as well. I have a feeling that these systems will probably do most of what current PhD students can do. Strong PhD students will benefit from this by effectively using and directing these systems. Interesting times.

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

Scale is the only permanent feature of state-of-the-art AI models. All other characteristics are subject to change and innovation.

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

It seems that most gains from RL comes from the pretrained model itself. The format reward stuff (GRPO etc) just extracts those capabilities. It helps to have good reward signal, but it's not the main ingredient.

Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

the conference is getting crazy over it today we're unveiling our 1st robot Hugging Face 🤝 Pollen Robotics a low-cost $250 open-source robot designed as an open-source platform for fun human computer interactions powered by HF Spaces-models-community > discord.com/invite/jsvMRQx…

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

The current debate on reasoning in LLMs: Group A: "We see it, we feel it, therefore it exists" Group B: "We don't see it, we don't feel it, therefore it doesn't exist" Group C (<0.01%): "What is reasoning?"