Irina Rish (@irinarish) Twitter Tweets • TwiCopy

Shayne Longpre

a year ago

✨New Report✨ Our data ecosystem audit across text, speech, and video (✏️,📢,📽️) finds: 📈 Rising reliance on web, synthetic, and YouTube data. 🛑 80%+ datasets carry hidden restrictions. 🌍 Relative representation in languages and creators has not improved for 10+ yrs.

thumb_up_off_alt86

chat_bubble_outline1

repeat43

shareShare

Tejas Vaidhya

@imtejas13

9 months ago

🎉 Thrilled to share that our paper "Surprising effectiveness of pretraining ternary language models at scale" earned a spotlight at #ICLR2024! We dive into Ternary Language Models (TriLMs), systematically studying their training feasibility and scaling laws against FloatLMs.

thumb_up_off_alt60

chat_bubble_outline3

repeat12

shareShare

Irina Rish

@irinarish

9 months ago

Worried about memory bottlenecks and slow inference? Need small but mighty LLMs on edge devices? Train ternary LLMs - you'll get the best trade-off between performance vs memory (+ significant inference speed-up), as compared to full-precison models and post-training

thumb_up_off_alt52

chat_bubble_outline1

repeat4

shareShare

Benjamin Thérien

@benjamintherien

9 months ago

How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still match full re-training performance? Excited to present “Continual Pre-training of MoEs: How robust is your router?”!🧵arxiv.org/abs/2503.05029 1/N

thumb_up_off_alt39

chat_bubble_outline1

repeat20

shareShare

Dawn Song

@dawnsongtweets

9 months ago

🚀 Really excited to launch #AgentX competition hosted by UC Berkeley RDI UC Berkeley alongside our LLM Agents MOOC series (a global community of 22k+ learners & growing fast). Whether you're building the next disruptive AI startup or pushing the research frontier, AgentX is your

🚀 Really excited to launch #AgentX competition hosted by <a href="/BerkeleyRDI/">UC Berkeley RDI</a> <a href="/UCBerkeley/">UC Berkeley</a> alongside our LLM Agents MOOC series (a global community of 22k+ learners & growing fast). Whether you're building the next disruptive AI startup or pushing the research frontier, AgentX is your

thumb_up_off_alt410

chat_bubble_outline20

repeat108

shareShare

Gopeshh Subbaraj

@gopeshh1

7 months ago

1/ Most RL methods assumes a turn-based setup-- agent acts, environment responds. But in the real world, the environment doesn’t wait. In real-time RL, slow inference means missed actions or delayed ones. This leads to two key challenges: • Inaction Regret • Delay Regret

thumb_up_off_alt34

chat_bubble_outline2

repeat8

shareShare

Sonia

@soniajoseph_

7 months ago

I’m really excited about Diffusion Steering Lens, an intuitive and elegant new “logit lens” technique for decoding the attention and MLP blocks of vision transformers! Vision is much more expressive than language, so some new mech interp rules apply:

thumb_up_off_alt34

chat_bubble_outline0

repeat6

shareShare

Arnav Jain

@arnavkj95

7 months ago

📢 Come say hi at our SFM poster at #ICLR2025, Poster Session 5 – #572! We’re presenting a method for Inverse Reinforcement Learning via Successor Feature Matching — a non-adversarial approach that works without action labels. Excited to share and chat!

thumb_up_off_alt33

chat_bubble_outline0

repeat10

shareShare

Arnav Jain

@arnavkj95

7 months ago

🚀Excited to present a simple and scalable RL framework for multi-turn code generation with one-step recoverability and learned verifiers. Come say hi at our poster at VerifAI and SSI-FM workshops today, and Reasoning and Planning workshop tomorrow.

thumb_up_off_alt47

chat_bubble_outline0

repeat8

shareShare

Irina Rish

@irinarish

7 months ago

Our Spectra paper on efficient quantized LLMs was presented at ICLR (spotlight and poster) by the amazing Ayush Kaushal and Tejas Vaidhya (Mila - Institut québécois d'IA / Nolano.ai ) - great job, and so much more to build next, for any data modalities (work in progress)! Welcome to the world of

thumb_up_off_alt44

chat_bubble_outline5

repeat6

shareShare

Arjun Ashok

@arjunashok37

7 months ago

Context is Key🗝️ is accepted at ICML 2025! 📈 Let's catch up if you'll be at ICML 🛬 See the poster and tweet thread below for a preview of CiK 👇 x.com/arjunashok37/s… And stay tuned for new results ;)

thumb_up_off_alt56

chat_bubble_outline5

repeat10

shareShare

Irina Rish

@irinarish

7 months ago

totally! I love vibe-coding, the efficiency is unreal

thumb_up_off_alt20

chat_bubble_outline0

repeat5

shareShare

Irina Rish

@irinarish

7 months ago

Quite impressive!

thumb_up_off_alt27

chat_bubble_outline1

repeat5

shareShare

Guillaume Dumas

@introspection

6 months ago

Grateful for the IVADO Exploratory Grant with @IrinaRish & Tommaso Tosato on how #LLMs express personality traits & socio-emotional responses—toward safer #AI in Health & Education ivado.ca/en/2025/04/09/…

thumb_up_off_alt18

chat_bubble_outline1

repeat1

shareShare

Guillaume Dumas

@introspection

6 months ago

IVADO Irina Rish Tommaso Tosato You can already check our recent works on this topic: - LLMs and Personalities: Inconsistencies Across Scales openreview.net/forum?id=vBg3O… - Lost in Translation: The Algorithmic Gap Between LMs and the Brain arxiv.org/abs/2407.04680

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Tim Dettmers

@tim_dettmers

6 months ago

MatFormers are very powerful alternatives to transformers. Similar to a regular transformer, but after training, you can split up the model to any size you like and get very strong performance that scales just like a regular transformer. So train once, get models of all sizes!

thumb_up_off_alt351

chat_bubble_outline5

repeat45

shareShare

Ethan Mollick

@emollick

6 months ago

Huh. Looks like Plato was right. A new paper shows all language models converge on the same "universal geometry" of meaning. Researchers can translate between ANY model's embeddings without seeing the original text. Implications for philosophy and vector databases alike.

thumb_up_off_alt13,13K

chat_bubble_outline402

repeat1,1K

shareShare

Guillaume Dumas

@introspection

6 months ago

🥳Nice! Our project “Towards a Quantum #NeuroIA” just got seed funding from @ai_UNIQUE! After a year in stealth w/Annemarie Wolff, our benchmarks show #quantum speedups for brain data simulation & analysis using Qiskit + IBM QS1 —> Next: #OpenSource tools & intl. collab 🇯🇵🔄🇨🇦

🥳Nice! Our project “Towards a Quantum #NeuroIA” just got seed funding from @ai_UNIQUE!
After a year in stealth w/<a href="/AnnemarieWolff/">Annemarie Wolff</a>, our benchmarks show #quantum speedups for brain data simulation & analysis using <a href="/qiskit/">Qiskit</a> + <a href="/IBM/">IBM</a> QS1
—> Next: #OpenSource tools & intl. collab 🇯🇵🔄🇨🇦

thumb_up_off_alt29

chat_bubble_outline0

repeat6

shareShare

Arthur Douillard

@ar_douillard

6 months ago

MuLoCo: Muon x DiLoCo = ❤️ arxiv.org/abs/2505.23725 from Benjamin Thérien, Xiaolong Huang, Irina Rish, Eugene Belilovsky * Using Muon as inner optimizer * Add quantization of the outer gradient to 2 bits (!) * Add error feedback

MuLoCo: Muon x DiLoCo = ❤️

arxiv.org/abs/2505.23725
from <a href="/benjamintherien/">Benjamin Thérien</a>, Xiaolong Huang, <a href="/irinarish/">Irina Rish</a>, <a href="/ebelilov/">Eugene Belilovsky</a>

* Using Muon as inner optimizer
* Add quantization of the outer gradient to 2 bits (!)
* Add error feedback

thumb_up_off_alt142

chat_bubble_outline4

repeat20

shareShare

Ethan Caballero is busy

@ethancaballero

6 months ago

🤨 chatgpt.com/share/683ce589…

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare