Mehrnaz Mofakhami (@mhrnz_m) 's Twitter Profile
Mehrnaz Mofakhami

@mhrnz_m

MSc student @Mila_Quebec, WiML @NeurIPSConf'24 Mentorship Chair, Previous Visiting Researcher @ServiceNowRSRCH

ID: 1135914484277686272

calendar_today04-06-2019 14:21:32

122 Tweet

727 Followers

617 Following

Aarash Feizi (@aarashfeizi) 's Twitter Profile Photo

🚨 Excited to introduce PairBench! 🚨 💡 TL;DR: VLM-judges can fail at data comparison! ✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation. 📄Paper: arxiv.org/abs/2502.15210 🧵 Thread: 👇

🚨  Excited to introduce PairBench! 🚨

💡 TL;DR: VLM-judges can fail at data comparison! 

✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation.

📄Paper: arxiv.org/abs/2502.15210

🧵  Thread: 👇
Reyhane Askari (@reyhaneaskari) 's Twitter Profile Photo

🚀 New Paper Alert! Can we generate informative synthetic data that truly helps a downstream learner? Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 🔥

🚀 New Paper Alert! 

Can we generate informative synthetic data that truly helps a downstream learner?

Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 

🔥
Reza Bayat (@reza_byt) 's Twitter Profile Photo

New Paper Alert!📄 "It’s better to be sparse than to be dense" ✨ We explore how to steer LLMs (like Gemma-2 2B & 9B) by modifying their activations in sparse spaces, enabling more precise, interpretable control & improved monosemanticity with scaling. Let’s break it down! 🧵

New Paper Alert!📄

"It’s better to be sparse than to be dense" ✨

We explore how to steer LLMs (like Gemma-2 2B & 9B) by modifying their activations in sparse spaces, enabling more precise, interpretable control & improved monosemanticity with scaling.

Let’s break it down! 🧵
Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (<700 lines) - super hackable - no TRL / Verl, no abstraction💆‍♂️ - Single GPU, full param tuning, 3B LLM - Efficient (R1-zero countdown < 10h) comes with a from-scratch, fully spelled out YT video [1/n]

Introducing nanoAhaMoment: Karpathy-style, single file RL for LLM library (&lt;700 lines)

- super hackable
- no TRL / Verl, no abstraction💆‍♂️
- Single GPU, full param tuning, 3B LLM
- Efficient (R1-zero countdown &lt; 10h)

comes with a from-scratch, fully spelled out YT video [1/n]
António Góis (@antgois) 's Twitter Profile Photo

Happy to announce "Performative Prediction on Games and Mechanism Design" was accepted at AISTATS Conference 2025, and got spotlight at HAIC(ICLR 2026 workshop) with Mehrnaz Mofakhami Fernando P. Santos Gauthier Gidel Simon Lacoste-Julien (Mila and UvA) arxiv.org/abs/2408.05146 Details below 1/9🧵

🇺🇦 Dzmitry Bahdanau (@dbahdanau) 's Twitter Profile Photo

ICLR 2025 many many many thanks to Kyunghyun Cho and Yoshua Bengio for enabling the wildest ever start of my research career 2014 was a very special time to do deep learning, a commit that changes 50 lines of code could give you a ToT award 10 years later 😲

Reyhane Askari (@reyhaneaskari) 's Twitter Profile Photo

Excited to be at #ICLR2025 next week! I'm currently on the job market for Research Scientist positions, especially in generative modeling, synthetic data, diffusion models, or responsible AI. Feel free to reach out if you have any openings!

Ryan D'Orazio (@ryandorazio) 's Twitter Profile Photo

This week I'll be at #ICLR25. If you like fundamental optimization results, I'll be presenting our work on surrogate losses for non-convex-concave min-max problems and learning value functions in deep RL (VIs more generally). Poster: #377 Thursday April 24 10am-12:30pm

Divyat Mahajan (@divyat09) 's Twitter Profile Photo

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025 📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions! 📜 arxiv.org/abs/2410.06303

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025

📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions!

📜 arxiv.org/abs/2410.06303
Katie Everett (@_katieeverett) 's Twitter Profile Photo

1. We often observe power laws between loss and compute: loss = a * flops ^ b + c 2. Models are rapidly becoming more efficient, i.e. use less compute to reach the same loss But: which innovations actually change the exponent in the power law (b) vs change only the constant (a)?

Damien Ferbach (@damien_ferbach) 's Twitter Profile Photo

It's very difficult to improve the *exponent* in scaling laws for loss vs compute, especially by changing the optimizer! Our new paper shows that scaling momentum correctly can *provably* improve the scaling exponent on a theoretical model. Empirically, it works on LSTMs too!

It's very difficult to improve the *exponent* in scaling laws for loss vs compute, especially by changing the optimizer!
Our new paper shows that scaling momentum correctly can *provably* improve the scaling exponent on a theoretical model. Empirically, it works on LSTMs too!
Joey Bose (@bose_joey) 's Twitter Profile Photo

🎉Personal update: I'm thrilled to announce that I'm joining Imperial College London Imperial College London as an Assistant Professor of Computing Imperial Computing starting January 2026. My future lab and I will continue to work on building better Generative Models 🤖, the hardest

Reza Bayat (@reza_byt) 's Twitter Profile Photo

📄 New Paper Alert! ✨ 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput.

📄 New Paper Alert! ✨

🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput

Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput.
Nikita Saxena (she/her) (@nikitasaxena02) 's Twitter Profile Photo

Heading to Conference on Language Modeling in Montreal? So is WiML! 🎉 We are organizing our first ever event at #CoLM2025 and we want you to choose the format! What excites you the most? Have a different idea? Let us know in the replies! 👇 RT to spread the word! ⏩

Saba (@saba_a96) 's Twitter Profile Photo

We built a new 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 + 𝗥𝗟 image editing model using a strong verifier — and it beats SOTA diffusion baselines using 5× less data. 🔥 𝗘𝗔𝗥𝗟: a simple, scalable RL pipeline for high-quality, controllable edits. 🧵1/

We built a new 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 + 𝗥𝗟 image editing model using a strong verifier — and it beats SOTA diffusion baselines using 5× less data.
🔥 𝗘𝗔𝗥𝗟: a simple, scalable RL pipeline for high-quality, controllable edits.
🧵1/