Gonçalo Faria (@goncalorafaria) 's Twitter Profile
Gonçalo Faria

@goncalorafaria

PhD Student at @uwcse @uwnlp.

ID: 481516935

linkhttp://www.goncalofaria.com calendar_today02-02-2012 21:19:22

39 Tweet

160 Followers

328 Following

Gonçalo Faria (@goncalorafaria) 's Twitter Profile Photo

🚀 Thrilled to present our QUEST poster at #NeurIPS2024 in Vancouver! 📅 Thursday, Dec 12th | 4:30-7:30 PM PST 📍 East Exhibit Hall A-C #3108 Looking forward to engaging discussions & connecting with fellow researchers! Stop by to learn more about our work. Joint work with

Alisa Liu (@alisawuffles) 's Twitter Profile Photo

We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵

We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words.

When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵
Yangjun Ruan (@yangjunr) 's Twitter Profile Photo

New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”. arxiv.org/abs/2503.18866 Here’s how it works🧵

New paper on synthetic pretraining!

We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”.
arxiv.org/abs/2503.18866

Here’s how it works🧵
Nathan Lambert (@natolambert) 's Twitter Profile Photo

Very cool paper for more scalable inference time compute. To start, it’s shows that best of N sampling for inference time compute is effectively shifting the Beta of RLFH optimization, which leads to overoptimization of an RM. majority voting has a different limitation where

Ian Magnusson (@ianmagnusson) 's Twitter Profile Photo

🔭 Science relies on shared artifacts collected for the common good. 🛰 So we asked: what's missing in open language modeling? 🪐 DataDecide 🌌 charts the cosmos of pretraining—across scales and corpora—at a resolution beyond any public suite of models that has come before.

Oreva Ahia (@orevaahia) 's Twitter Profile Photo

Working on tokenization across any modality, text, audio, images, videos ? Submit your paper to our Tokenization Workshop at #ICML2025!

Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Our previous work showed that 𝐜𝐫𝐞𝐚𝐭𝐢𝐧𝐠 𝐯𝐢𝐬𝐮𝐚𝐥 𝐜𝐡𝐚𝐢𝐧‑𝐨𝐟‑𝐭𝐡𝐨𝐮𝐠𝐡𝐭𝐬 𝐯𝐢𝐚 𝐭𝐨𝐨𝐥 𝐮𝐬𝐞 significantly boosts GPT‑4o’s visual reasoning performance. Excited to see this idea incorporated into OpenAI’s o3 and o4‑mini models (openai.com/index/thinking…).

Rohan Baijal (@rohanbaijal) 's Twitter Profile Photo

Long Range Navigator (LRN) 🧭— an approach to extend planning horizons for off-road navigation given no prior maps. Using vision LRN makes longer-range decisions by spotting navigation frontiers far beyond the range of metric maps. personalrobotics.github.io/lrn/

Kunal Jha (@kjha02) 's Twitter Profile Photo

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity.

Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks.

shorturl.at/fqsNN🧵
Yanai Elazar (@yanaiela) 's Twitter Profile Photo

💡 New ICLR paper! 💡 "On Linear Representations and Pretraining Data Frequency in Language Models": We provide an explanation for when & why linear representations form in large (or small) language models. Led by Jack Merullo , w/ Noah A. Smith & Sarah Wiegreffe

💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when & why linear representations form in large (or small) language models.

Led by <a href="/jack_merullo_/">Jack Merullo</a> , w/ <a href="/nlpnoah/">Noah A. Smith</a> &amp; <a href="/sarahwiegreffe/">Sarah Wiegreffe</a>