Zhaofeng Wu @ ICLR (@zhaofeng_wu) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠

thumb_up_off_alt543

chat_bubble_outline26

repeat52

shareShare

Zhaofeng Wu @ ICLR

@zhaofeng_wu

4 months ago

Follow us 🥳

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Simeng (Sophia) Han

@hansineng

4 months ago

New England NLP 2025 is this Friday, at one of the most stunning campuses in the world! Yale University nenlp.github.io/spr2025/

New England NLP 2025 is this Friday, at one of the most stunning campuses in the world! <a href="/Yale/">Yale University</a>

nenlp.github.io/spr2025/

thumb_up_off_alt34

chat_bubble_outline2

repeat6

shareShare

Jiacheng Liu

@liujc1998

4 months ago

Today we're unveiling OLMoTrace, a tool that enables everyone to understand the outputs of LLMs by connecting to their training data. We do this on unprecedented scale and in real time: finding matching text between model outputs and 4 trillion training tokens within seconds. ✨

thumb_up_off_alt281

chat_bubble_outline9

repeat46

shareShare

Zhaofeng Wu @ ICLR

@zhaofeng_wu

3 months ago

Come chat with us on Saturday 4/26 at 10am (poster #240) if you're interested! Also DM is open -- happy to chat about multilinguality/interpretability/any random stuff during the conference! (though I may respond faster to email/whova)

thumb_up_off_alt34

chat_bubble_outline0

repeat1

shareShare

Yung-Sung Chuang

@yungsungchuang

3 months ago

SelfCite is now accepted by ICML 2025! 🎉 See you in Vancouver! 🇨🇦

thumb_up_off_alt64

chat_bubble_outline0

repeat8

shareShare

Songlin Yang

@songlinyang4

2 months ago

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

thumb_up_off_alt424

chat_bubble_outline9

repeat79

shareShare

Lifan Yuan

@lifan__yuan

2 months ago

We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.

thumb_up_off_alt546

chat_bubble_outline8

repeat85

shareShare

Simeng (Sophia) Han

@hansineng

2 months ago

Zero fluff, maximum insight ✨. Let’s see what LLMs are really made of, with 🧠 Brainteasers. We’re not grading answers 🔢. We’re grading thinking 💭. Brute force? Creative leap? False confession? 🤔 Instead of asking “Did the model get the right answer?”, we ask: “Did it

thumb_up_off_alt66

chat_bubble_outline0

repeat14

shareShare

Naman Jain @ ICLR

@stringchaos

2 months ago

Can SWE-Agents aid in High-Performance Software development? ⚡️🤔 Introducing GSO: A Challenging Code Optimization Benchmark 🔍 Unlike simple bug fixes, this combines algorithmic reasoning with systems programming 📊 Results: Current agents struggle with <5% success rate!

thumb_up_off_alt133

chat_bubble_outline4

repeat25

shareShare

Billy Xuanming Zhang

@xuanmingzhang07

2 months ago

😵‍💫 Long-context human-AI planning with LLMs struggles when users have to manually manage all the context in messy chats (e.g. with ChatGPT). Meet 💡JumpStarter: task-structured context curation for better, collaborative planning with LLMs on complex tasks. 🧵 (1/n)

thumb_up_off_alt53

chat_bubble_outline2

repeat14

shareShare

Mengzhou Xia

@xiamengzhou

2 months ago

Surprisingly, we find training only with incorrect traces leads to strong performance 🤯 Even more interesting: it improves model diversity and test-time scaling—while correct traces do the opposite. Check out the 🧵👇

thumb_up_off_alt156

chat_bubble_outline2

repeat18

shareShare

Ximing Lu

@gximing

2 months ago

What happens when you ✨scale up RL✨? In our new work, Prolonged RL, we significantly scale RL training to >2k steps and >130k problems—and observe exciting, non-saturating gains as we spend more compute 🚀.

thumb_up_off_alt126

chat_bubble_outline1

repeat17

shareShare

Han Guo

@hanguo97

2 months ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

Junhong Shen

@junhongshen1

a month ago

🔥Unlocking New Paradigm for Test-Time Scaling of Agents! We introduce Test-Time Interaction (TTI), which scales the number of interaction steps beyond thinking tokens per step. Our agents learn to act longer➡️richer exploration➡️better success Paper: arxiv.org/abs/2506.07976

thumb_up_off_alt154

chat_bubble_outline7

repeat36

shareShare

Songlin Yang

@songlinyang4

a month ago

Flash Linear Attention (github.com/fla-org/flash-…) will no longer maintain support for the RWKV series (existing code will remain available). Here’s why:

thumb_up_off_alt793

chat_bubble_outline11

repeat75

shareShare

Yijia Shao

@echoshao8899

a month ago

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵

thumb_up_off_alt280

chat_bubble_outline6

repeat47

shareShare

Jyo Pari

@jyo_pari

a month ago

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

thumb_up_off_alt3,3K

chat_bubble_outline124

repeat514

shareShare

Kaiser Sun

@kaiserwholearns

a month ago

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑 TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8 #NLProc #LLM #AIResearch

thumb_up_off_alt32

chat_bubble_outline2

repeat9

shareShare