Artidoro Pagnoni (@artidoropagnoni) Twitter Tweets • TwiCopy

Ġabe Ġrand

5 months ago

Tackling complex problems with LMs requires search/planning, but how should test-time compute be structured? Introducing Self-Steering, a new meta-reasoning framework where LMs coordinate their own inference procedures by writing code!

thumb_up_off_alt108

chat_bubble_outline7

repeat37

shareShare

AI at Meta

@aiatmeta

5 months ago

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image &

thumb_up_off_alt983

chat_bubble_outline56

repeat248

shareShare

Alexander Doria

@dorialexander

5 months ago

Ah, Meta released the weights of Byte Latent Transformer! Both 7b and 1b (currently under weak gating).

thumb_up_off_alt248

chat_bubble_outline9

repeat26

shareShare

Srini Iyer

@sriniiyer88

5 months ago

BLT model weights are out! Responding to popular demand, we just open-sourced model weights for our 1B and 8B BLT models for the research community to play with! huggingface.co/facebook/blt Hoping to see many new and improved BLT based architectures this year!

thumb_up_off_alt70

chat_bubble_outline3

repeat19

shareShare

Dr. Pedro Rodriguez @[email protected]

@entilzhapr

5 months ago

By popular demand (see our GH issues 😅), we're releasing 1B and 8B weights for our BLT models! We're also hard at work at adding BLT to HF transformers! Model Weights: huggingface.co/facebook/blt Code + Instructions for loading weights: github.com/facebookresear…

thumb_up_off_alt19

chat_bubble_outline0

repeat5

shareShare

Artidoro Pagnoni

@artidoropagnoni

5 months ago

BLT sees the world in bytes but operates on patches allowing for arbitrary compression rates. This could potentially change a lot of things. Check out model weights and help us make it better! ⬇️

thumb_up_off_alt15

chat_bubble_outline0

repeat1

shareShare

Yiping Wang

@ypwang61

4 months ago

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: arxiv.org/abs/2504.20571

thumb_up_off_alt413

chat_bubble_outline14

repeat84

shareShare

Kyunghyun Cho

@kchonyc

4 months ago

on my way back to NYC, i met wise Leon Bottou in the airport. we talked. then i told him "you should tweet that!" and, he delivered much more than a tweet: a blog post with thoughts and insights on AI research only he can deliver this clearly and succinctly.

thumb_up_off_alt465

chat_bubble_outline9

repeat56

shareShare

Weixin Liang

@liang_weixin

4 months ago

🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! 📌 GitHub repo: github.com/facebookresear… 📄 Paper: arxiv.org/abs/2411.04996 How can we reduce pretraining costs for

thumb_up_off_alt435

chat_bubble_outline3

repeat84

shareShare

Philippe Laban

@philippelaban

4 months ago

🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120

thumb_up_off_alt126

chat_bubble_outline5

repeat30

shareShare

Kyunghyun Cho

@kchonyc

4 months ago

it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. Sungmin Cha and i decided to see if we can come up with the simplest working description of KD in this work. we ended

thumb_up_off_alt359

chat_bubble_outline7

repeat43

shareShare

Joel Jang

@jang_yoel

4 months ago

Introducing 𝐃𝐫𝐞𝐚𝐦𝐆𝐞𝐧! We got humanoid robots to perform totally new 𝑣𝑒𝑟𝑏𝑠 in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick 🧵

thumb_up_off_alt326

chat_bubble_outline7

repeat65

shareShare

The Harvard Crimson

@thecrimson

4 months ago

#BREAKING: The Trump administration revoked Harvard’s ability to enroll international students on Thursday, dramatically escalating the administration’s fight with the University and threatening thousands of current students. thecrimson.com/article/2025/5…

thumb_up_off_alt511

chat_bubble_outline82

repeat286

shareShare

Benjamin Minixhofer

@bminixhofer

4 months ago

We achieved the first instance of successful subword-to-byte distillation in our (just updated) paper. This enables creating byte-level models at a fraction of the cost of what was needed previously. As a proof-of-concept, we created byte-level Gemma2 and Llama3 models. 🧵

$We achieved the first instance of successful subword-to-byte distillation in our (just updated) paper. This enables creating byte-level models at a fraction of the cost of what was needed previously. As a proof-of-concept, we created byte-level Gemma2 and Llama3 models. 🧵$

thumb_up_off_alt59

chat_bubble_outline1

repeat14

shareShare

Stella Li

@stellalisy

3 months ago

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

thumb_up_off_alt1,1K

chat_bubble_outline69

repeat322

shareShare

Yizhong Wang

@yizhongwyz

3 months ago

Thrilled to announce that I will be joining UT Austin Computer Science at UT Austin as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

Thrilled to announce that I will be joining <a href="/UTAustin/">UT Austin</a> <a href="/UTCompSci/">Computer Science at UT Austin</a> as an assistant professor in fall 2026!

I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

thumb_up_off_alt620

chat_bubble_outline98

repeat48

shareShare

Greg Durrett

@gregd_nlp

3 months ago

Revoking visas to Chinese PhD students is economically shortsighted and inhumane. Most Chinese PhD students stay in the U.S. after graduation (first image, stats from 2022). They're staying and building technology in the U.S., not taking it to China. Immigrant students create

thumb_up_off_alt357

chat_bubble_outline6

repeat41

shareShare

Akari Asai

@akariasai

3 months ago

‘Bold,’ ‘positive’ and ‘unparalleled’: Allen School Ph.D. graduates Ashish Sharma and Sewon Min recognized with ACM Doctoral Dissertation Awards news.cs.washington.edu/2025/06/04/all… Massive congrats to Ashish Sharma and Sewon Min - huge win for UW NLP and the broader NLP community! 🙌

thumb_up_off_alt178

chat_bubble_outline5

repeat17

shareShare

Ludwig Schmidt

@lschmidt3

3 months ago

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat208

shareShare

Han Guo

@hanguo97

3 months ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare