Csordás Róbert (@robert_csordas) Twitter Tweets • TwiCopy

Csordás Róbert

@robert_csordas

+ Follow

Postdoc at Stanford working on systematic generalization and algorithmic reasoning. Ex IDSIA PhD, Ex @DeepMind intern.

ID: 745005274784751616

linkhttps://robertcsordas.github.io/ calendar_today20-06-2016 21:27:54

169 Tweet

762 Followers

426 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Today, we're joined by Julie Kallini ✨, PhD student at Stanford NLP Group to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

Julie Kallini ✨ @ ICLR 2025 ✈️

@juliekallini

4 months ago

Had a great time chatting with Sam Charrington about MrT5 and Mission: Impossible—thank you for having me on the TWIML AI Podcast!

thumb_up_off_alt33

chat_bubble_outline0

repeat9

shareShare

Julie Kallini ✨ @ ICLR 2025 ✈️

@juliekallini

3 months ago

🚀 In T-minus 1 week, I’ll be at ICLR presenting MrT5! The final version has tons of updates: - New controller algorithm for targeted compression rates - More baselines and downstream tasks - Scaled-up experiments to 1.23B parameter models And now, MrT5 is on 🤗HuggingFace! 🧵

thumb_up_off_alt127

chat_bubble_outline4

repeat29

shareShare

Jürgen Schmidhuber

@schmidhuberai

3 months ago

My first work on metalearning or learning to learn came out in 1987 [1][2]. Back then nobody was interested. Today, compute is 10 million times cheaper, and metalearning is a hot topic 🙂 It’s fitting that my 100th journal publication [100] is about metalearning, too. [100]

thumb_up_off_alt610

chat_bubble_outline18

repeat119

shareShare

Shikhar

@shikharmurty

3 months ago

New #NAACL2025 paper! 🚨 Transformer LMs are data hungry, we propose a new auxiliary loss function (TreeReg) to fix that. TreeReg takes bracketing decisions from syntax trees and turns them into orthogonality constraints on span representations. ✅ Boosts pre-training data

thumb_up_off_alt93

chat_bubble_outline4

repeat22

shareShare

Julie Kallini ✨ @ ICLR 2025 ✈️

@juliekallini

3 months ago

If you're at #ICLR2025 this week, come check out my poster for 💪MrT5 on Thursday (4/24) from 10am to 12:30pm! The poster is at Hall 3 + Hall 2B #273. I'll also be giving a ⚡ lightning talk right after at the session on tokenizer-free, end-to-end architectures in Opal 103-104!

thumb_up_off_alt55

chat_bubble_outline0

repeat8

shareShare

Piotr Piękos

@piotrpiekosai

3 months ago

What if instead of a couple of dense attention heads, we use lots of sparse heads, each learning to select its own set of tokens to process? Introducing Mixture of Sparse Attention (MoSA)

thumb_up_off_alt18

chat_bubble_outline1

repeat3

shareShare

fly51fly

@fly51fly

2 months ago

[LG] Do Language Models Use Their Depth Efficiently? R Csordás, C D. Manning, C Potts [Stanford University] (2025) arxiv.org/abs/2505.13898

thumb_up_off_alt37

chat_bubble_outline3

repeat10

shareShare

William Merrill

@lambdaviking

2 months ago

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀 New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

thumb_up_off_alt275

chat_bubble_outline3

repeat37

shareShare

Aryaman Arora

@aryaman2020

2 months ago

new paper! 🫡 why are state space models (SSMs) worse than Transformers at recall over their context? this is a question about the mechanisms underlying model behaviour: therefore, we propose using mechanistic evaluations to answer it!

thumb_up_off_alt641

chat_bubble_outline11

repeat84

shareShare

Houjun Liu

@houjun_liu

2 months ago

New Paper Day! For ACL Findings 2025: You should **drop dropout** when you are training your LMs AND MLMs!

thumb_up_off_alt84

chat_bubble_outline3

repeat16

shareShare

Mehrdad Farajtabar

@mfarajtabar

2 months ago

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,

thumb_up_off_alt2,2K

chat_bubble_outline101

repeat532

shareShare

David Chiang

@davidweichiang

a month ago

New on arXiv: Knee-Deep in C-RASP, by Andy J Yang, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

New on arXiv: Knee-Deep in C-RASP, by <a href="/pentagonalize/">Andy J Yang</a>, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

thumb_up_off_alt36

chat_bubble_outline1

repeat9

shareShare

Nouha Dziri

@nouhadziri

a month ago

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found

thumb_up_off_alt714

chat_bubble_outline22

repeat157

shareShare

Csordás Róbert

Gate.io

The TWIML AI Podcast

Julie Kallini ✨ @ ICLR 2025 ✈️

Julie Kallini ✨ @ ICLR 2025 ✈️

Jürgen Schmidhuber

Shikhar

Julie Kallini ✨ @ ICLR 2025 ✈️

Piotr Piękos

fly51fly

William Merrill

Aryaman Arora

Houjun Liu

Mehrdad Farajtabar

David Chiang

Nouha Dziri