Artidoro Pagnoni (@artidoropagnoni) 's Twitter Profile
Artidoro Pagnoni

@artidoropagnoni

PhD @uwnlp @AIatMeta. Bending the scaling laws.

ID: 3583993995

calendar_today08-09-2015 04:35:23

310 Tweet

1,1K Followers

561 Following

Ġabe Ġrand (@gabe_grand) 's Twitter Profile Photo

Tackling complex problems with LMs requires search/planning, but how should test-time compute be structured? Introducing Self-Steering, a new meta-reasoning framework where LMs coordinate their own inference procedures by writing code!

AI at Meta (@aiatmeta) 's Twitter Profile Photo

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image &

Srini Iyer (@sriniiyer88) 's Twitter Profile Photo

BLT model weights are out! Responding to popular demand, we just open-sourced model weights for our 1B and 8B BLT models for the research community to play with! huggingface.co/facebook/blt Hoping to see many new and improved BLT based architectures this year!

Dr. Pedro Rodriguez @par@sigmoid.social (@entilzhapr) 's Twitter Profile Photo

By popular demand (see our GH issues 😅), we're releasing 1B and 8B weights for our BLT models! We're also hard at work at adding BLT to HF transformers! Model Weights: huggingface.co/facebook/blt Code + Instructions for loading weights: github.com/facebookresear…

Artidoro Pagnoni (@artidoropagnoni) 's Twitter Profile Photo

BLT sees the world in bytes but operates on patches allowing for arbitrary compression rates. This could potentially change a lot of things. Check out model weights and help us make it better! ⬇️

Yiping Wang (@ypwang61) 's Twitter Profile Photo

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: arxiv.org/abs/2504.20571

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks!

📍RLVR with one training example can boost:
         - Qwen2.5-Math-1.5B: 36.0% → 73.6%
         - Qwen2.5-Math-7B: 51.0% → 79.2% 
       on MATH500.

📄 Paper: arxiv.org/abs/2504.20571
Kyunghyun Cho (@kchonyc) 's Twitter Profile Photo

on my way back to NYC, i met wise Leon Bottou in the airport. we talked. then i told him "you should tweet that!" and, he delivered much more than a tweet: a blog post with thoughts and insights on AI research only he can deliver this clearly and succinctly.

on my way back to NYC, i met wise Leon Bottou in the airport. we talked. then i told him "you should tweet that!"

and, he delivered much more than a tweet: a blog post with thoughts and insights on AI research only he can deliver this clearly and succinctly.
Weixin Liang (@liang_weixin) 's Twitter Profile Photo

🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! 📌 GitHub repo: github.com/facebookresear… 📄 Paper: arxiv.org/abs/2411.04996 How can we reduce pretraining costs for

🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced!

📌 GitHub repo: github.com/facebookresear…
📄 Paper: arxiv.org/abs/2411.04996

How can we reduce pretraining costs for
Philippe Laban (@philippelaban) 's Twitter Profile Photo

🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120

🆕paper: LLMs Get Lost in Multi-Turn Conversation

In real life, people don’t speak in perfect prompts.
So we simulate multi-turn conversations — less lab-like, more like real use.

We find that LLMs get lost in conversation.
👀What does that mean? 🧵1/N
📄arxiv.org/abs/2505.06120
Kyunghyun Cho (@kchonyc) 's Twitter Profile Photo

it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. Sungmin Cha and i decided to see if we can come up with the simplest working description of KD in this work. we ended

it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. <a href="/_sungmin_cha/">Sungmin Cha</a> and i decided to see if we can come up with the simplest working description of KD in this work. 

we ended
Joel Jang (@jang_yoel) 's Twitter Profile Photo

Introducing 𝐃𝐫𝐞𝐚𝐦𝐆𝐞𝐧! We got humanoid robots to perform totally new 𝑣𝑒𝑟𝑏𝑠 in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick 🧵

The Harvard Crimson (@thecrimson) 's Twitter Profile Photo

#BREAKING: The Trump administration revoked Harvard’s ability to enroll international students on Thursday, dramatically escalating the administration’s fight with the University and threatening thousands of current students. thecrimson.com/article/2025/5…

Benjamin Minixhofer (@bminixhofer) 's Twitter Profile Photo

We achieved the first instance of successful subword-to-byte distillation in our (just updated) paper. This enables creating byte-level models at a fraction of the cost of what was needed previously. As a proof-of-concept, we created byte-level Gemma2 and Llama3 models. 🧵

We achieved the first instance of successful subword-to-byte distillation in our (just updated) paper.

This enables creating byte-level models at a fraction of the cost of what was needed previously.

As a proof-of-concept, we created byte-level Gemma2 and Llama3 models.

🧵
Stella Li (@stellalisy) 's Twitter Profile Photo

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

🤯 We cracked RLVR with... Random Rewards?!
Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:
- Random rewards: +21%
- Incorrect rewards: +25%
- (FYI) Ground-truth rewards: + 28.8%
How could this even work⁉️ Here's why: 🧵
Blogpost: tinyurl.com/spurious-rewar…
Yizhong Wang (@yizhongwyz) 's Twitter Profile Photo

Thrilled to announce that I will be joining UT Austin Computer Science at UT Austin as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

Thrilled to announce that I will be joining <a href="/UTAustin/">UT Austin</a> <a href="/UTCompSci/">Computer Science at UT Austin</a> as an assistant professor in fall 2026! 

I will continue working on language models, data challenges, learning paradigms, &amp; AI for innovation. Looking forward to teaming up with new students &amp; colleagues! 🤠🤘
Greg Durrett (@gregd_nlp) 's Twitter Profile Photo

Revoking visas to Chinese PhD students is economically shortsighted and inhumane. Most Chinese PhD students stay in the U.S. after graduation (first image, stats from 2022). They're staying and building technology in the U.S., not taking it to China. Immigrant students create

Revoking visas to Chinese PhD students is economically shortsighted and inhumane.

Most Chinese PhD students stay in the U.S. after graduation (first image, stats from 2022). They're staying and building technology in the U.S., not taking it to China.

Immigrant students create
Akari Asai (@akariasai) 's Twitter Profile Photo

‘Bold,’ ‘positive’ and ‘unparalleled’: Allen School Ph.D. graduates Ashish Sharma and Sewon Min recognized with ACM Doctoral Dissertation Awards news.cs.washington.edu/2025/06/04/all… Massive congrats to Ashish Sharma and Sewon Min - huge win for UW NLP and the broader NLP community! 🙌

Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels