Rulin Shao (@rulinshao) 's Twitter Profile
Rulin Shao

@rulinshao

PhD @UWNLP, visiting researcher @Meta.

ID: 1512510916083994630

linkhttps://rulinshao.github.io/ calendar_today08-04-2022 19:21:35

200 Tweet

1,1K Followers

662 Following

jack morris (@jxmnop) 's Twitter Profile Photo

new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)

new paper from our work at Meta!

**GPT-style language models memorize 3.6 bits per param**

we compute capacity by measuring total bits memorized, using some theory from Shannon (1953)

shockingly, the memorization-datasize curves look like this:
      ___________
  /
/

(🧵)
Akari Asai (@akariasai) 's Twitter Profile Photo

‘Bold,’ ‘positive’ and ‘unparalleled’: Allen School Ph.D. graduates Ashish Sharma and Sewon Min recognized with ACM Doctoral Dissertation Awards news.cs.washington.edu/2025/06/04/all… Massive congrats to Ashish Sharma and Sewon Min - huge win for UW NLP and the broader NLP community! 🙌

Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
Zirui Liu (@ziruirayliu) 's Twitter Profile Photo

🔥Exited to share our new work on reproducibility challenges in reasoning models caused by numerical precision. Ever run the same prompt twice and get completely different answers from your LLM under greedy decoding? You're not alone. Most LLMs today default to BF16 precision,

Han Guo (@hanguo97) 's Twitter Profile Photo

One key takeaway from recent work on test-time compute: even a small weight update can make a big difference. So, what happens if we meta-learn those updates (and not necessarily at test time)? Excited to share this new work led by Adam Zweiger and Jyo Pari!

Rulin Shao (@rulinshao) 's Twitter Profile Photo

Honored to be part of organizing the LM4Sci workshop at #COLM2025! 🔬🤖 We invite submissions that demonstrate innovative approaches to scientific reasoning and discovery. Submit by June 23! 🚀

Rulin Shao (@rulinshao) 's Twitter Profile Photo

It reminds me of the cognitive behaviors that have been found to help reasoning—backtracking, subgoal setting, verifications, etc.—they all seem to fit this parallel generation pattern better than linearly chaining them. Looking forward to trying it out!

Thao Nguyen (@thao_nguyen26) 's Twitter Profile Photo

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔
We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats!

arxiv.org/abs/2506.04689
CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Bo Liu (Benjamin Liu) (@benjamin_eecs) 's Twitter Profile Photo

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces
Peng Qi (@qi2peng2) 's Twitter Profile Photo

Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of

Victoria Graf (@victoriawgraf) 's Twitter Profile Photo

Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark! Loved working w/ Valentina Pyatkin! Personal highlight: our multi-turn eval setting makes it possible to isolate constraint-following from the rest of the instruction 🔍

Rulin Shao (@rulinshao) 's Twitter Profile Photo

🚀 Last year: MassiveDS-1.4T showed great scaling gains with a web-scale datastore but was too heavy for online production ✨ Now: CompactDS is here! Better performance, compact size, ready for agentic apps & Deep Research RL training Kudos to Xinxi Lyu Michael Duan for leading this!

Rulin Shao (@rulinshao) 's Twitter Profile Photo

Happy to share that ReasonIR is accepted by Conference on Language Modeling! Synthetic data & test-time scaling are powerful tools to enable new capabilities for challenging tasks. I’m impressed by how quickly smaller retrievers and better rerankers have been developed with ReasonIR data! #COLM2025