Irina Rish (@irinarish) 's Twitter Profile
Irina Rish

@irinarish

prof UdeM/Mila; Canada Excellence Research Chair; AAI Lab head irina-lab.ai; INCITE project PI tinyurl.com/yc3jzudt; CSO nolano.ai

ID: 393867458

linkhttps://www.irina-rish.com calendar_today19-10-2011 06:22:21

6,6K Tweet

9,9K Followers

998 Following

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

✨New Report✨ Our data ecosystem audit across text, speech, and video (✏️,📢,📽️) finds: 📈 Rising reliance on web, synthetic, and YouTube data. 🛑 80%+ datasets carry hidden restrictions. 🌍 Relative representation in languages and creators has not improved for 10+ yrs.

Tejas Vaidhya (@imtejas13) 's Twitter Profile Photo

🎉 Thrilled to share that our paper "Surprising effectiveness of pretraining ternary language models at scale" earned a spotlight at #ICLR2024! We dive into Ternary Language Models (TriLMs), systematically studying their training feasibility and scaling laws against FloatLMs.

🎉 Thrilled to share that our paper "Surprising effectiveness of pretraining ternary language models at scale" earned a spotlight at #ICLR2024! We dive into Ternary Language Models (TriLMs), systematically studying their training feasibility and scaling laws against FloatLMs.
Irina Rish (@irinarish) 's Twitter Profile Photo

Worried about memory bottlenecks and slow inference? Need small but mighty LLMs on edge devices? Train ternary LLMs - you'll get the best trade-off between performance vs memory (+ significant inference speed-up), as compared to full-precison models and post-training

Benjamin Thérien (@benjamintherien) 's Twitter Profile Photo

How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still match full re-training performance? Excited to present “Continual Pre-training of MoEs: How robust is your router?”!🧵arxiv.org/abs/2503.05029 1/N

How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still match full re-training performance? Excited to present “Continual Pre-training of MoEs: How robust is your router?”!🧵arxiv.org/abs/2503.05029 1/N
Dawn Song (@dawnsongtweets) 's Twitter Profile Photo

🚀 Really excited to launch #AgentX competition hosted by UC Berkeley RDI UC Berkeley alongside our LLM Agents MOOC series (a global community of 22k+ learners & growing fast). Whether you're building the next disruptive AI startup or pushing the research frontier, AgentX is your

🚀 Really excited to launch #AgentX competition hosted by <a href="/BerkeleyRDI/">UC Berkeley RDI</a> <a href="/UCBerkeley/">UC Berkeley</a> alongside our LLM Agents MOOC series (a global community of 22k+ learners &amp; growing fast). Whether you're building the next disruptive AI startup or pushing the research frontier, AgentX is your
Gopeshh Subbaraj (@gopeshh1) 's Twitter Profile Photo

1/  Most RL methods assumes a turn-based setup-- agent acts, environment responds. But in the real world, the environment doesn’t wait. In real-time RL, slow inference means missed actions or delayed ones. This leads to two key challenges:  • Inaction Regret  • Delay Regret

Sonia (@soniajoseph_) 's Twitter Profile Photo

I’m really excited about Diffusion Steering Lens, an intuitive and elegant new “logit lens” technique for decoding the attention and MLP blocks of vision transformers! Vision is much more expressive than language, so some new mech interp rules apply:

Arnav Jain (@arnavkj95) 's Twitter Profile Photo

📢 Come say hi at our SFM poster at #ICLR2025, Poster Session 5 – #572! We’re presenting a method for Inverse Reinforcement Learning via Successor Feature Matching — a non-adversarial approach that works without action labels. Excited to share and chat!

📢 Come say hi at our SFM poster at #ICLR2025, Poster Session 5 – #572!

We’re presenting a method for Inverse Reinforcement Learning via Successor Feature Matching — a non-adversarial approach that works without action labels.

Excited to share and chat!
Arnav Jain (@arnavkj95) 's Twitter Profile Photo

🚀Excited to present a simple and scalable RL framework for multi-turn code generation with one-step recoverability and learned verifiers. Come say hi at our poster at VerifAI and SSI-FM workshops today, and Reasoning and Planning workshop tomorrow.

🚀Excited to present a simple and scalable RL framework for multi-turn code generation with one-step recoverability and learned verifiers. 

Come say hi at our poster at VerifAI and SSI-FM workshops today, and Reasoning and Planning workshop tomorrow.
Irina Rish (@irinarish) 's Twitter Profile Photo

Our Spectra paper on efficient quantized LLMs was presented at ICLR (spotlight and poster) by the amazing Ayush Kaushal and Tejas Vaidhya (Mila - Institut québécois d'IA / Nolano.ai ) - great job, and so much more to build next, for any data modalities (work in progress)! Welcome to the world of

Arjun Ashok (@arjunashok37) 's Twitter Profile Photo

Context is Key🗝️ is accepted at ICML 2025! 📈 Let's catch up if you'll be at ICML 🛬 See the poster and tweet thread below for a preview of CiK 👇 x.com/arjunashok37/s… And stay tuned for new results ;)

Context is Key🗝️ is accepted at ICML 2025! 📈 

Let's catch up if you'll be at ICML 🛬

See the poster and tweet thread below for a preview of CiK 👇
x.com/arjunashok37/s…

And stay tuned for new results ;)
Guillaume Dumas (@introspection) 's Twitter Profile Photo

Grateful for the IVADO Exploratory Grant with @IrinaRish & Tommaso Tosato on how #LLMs express personality traits & socio-emotional responses—toward safer #AI in Health & Education ivado.ca/en/2025/04/09/…

Guillaume Dumas (@introspection) 's Twitter Profile Photo

IVADO Irina Rish Tommaso Tosato You can already check our recent works on this topic: - LLMs and Personalities: Inconsistencies Across Scales openreview.net/forum?id=vBg3O… - Lost in Translation: The Algorithmic Gap Between LMs and the Brain arxiv.org/abs/2407.04680

Tim Dettmers (@tim_dettmers) 's Twitter Profile Photo

MatFormers are very powerful alternatives to transformers. Similar to a regular transformer, but after training, you can split up the model to any size you like and get very strong performance that scales just like a regular transformer. So train once, get models of all sizes!

Ethan Mollick (@emollick) 's Twitter Profile Photo

Huh. Looks like Plato was right. A new paper shows all language models converge on the same "universal geometry" of meaning. Researchers can translate between ANY model's embeddings without seeing the original text. Implications for philosophy and vector databases alike.

Huh. Looks like Plato was right.

A new paper shows all language models converge on the same "universal geometry" of meaning. Researchers can translate between ANY model's embeddings without seeing the original text.

Implications for philosophy and vector databases alike.
Guillaume Dumas (@introspection) 's Twitter Profile Photo

🥳Nice! Our project “Towards a Quantum #NeuroIA” just got seed funding from @ai_UNIQUE! After a year in stealth w/Annemarie Wolff, our benchmarks show #quantum speedups for brain data simulation & analysis using Qiskit + IBM QS1 —> Next: #OpenSource tools & intl. collab 🇯🇵🔄🇨🇦

🥳Nice! Our project “Towards a Quantum #NeuroIA” just got seed funding from @ai_UNIQUE!
After a year in stealth w/<a href="/AnnemarieWolff/">Annemarie Wolff</a>, our benchmarks show #quantum speedups for brain data simulation &amp; analysis using <a href="/qiskit/">Qiskit</a> + <a href="/IBM/">IBM</a> QS1
—&gt; Next: #OpenSource tools &amp; intl. collab 🇯🇵🔄🇨🇦
Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

MuLoCo: Muon x DiLoCo = ❤️ arxiv.org/abs/2505.23725 from Benjamin Thérien, Xiaolong Huang, Irina Rish, Eugene Belilovsky * Using Muon as inner optimizer * Add quantization of the outer gradient to 2 bits (!) * Add error feedback

MuLoCo: Muon x DiLoCo = ❤️

arxiv.org/abs/2505.23725
from <a href="/benjamintherien/">Benjamin Thérien</a>, Xiaolong Huang, <a href="/irinarish/">Irina Rish</a>, <a href="/ebelilov/">Eugene Belilovsky</a> 

* Using Muon as inner optimizer
* Add quantization of the outer gradient to 2 bits (!)
* Add error feedback