Nathan Brown (@oxxotweets) 's Twitter Profile
Nathan Brown

@oxxotweets

SWE @Microsoft AI; multilingual LLMs and other shenanigans; Masters grad @ Clemson; Probably staring at wandb logs; DMs open

ID: 1408520605671071745

linkhttps://oxxocodes.github.io/ calendar_today25-06-2021 20:21:08

521 Tweet

83 Followers

638 Following

Anthropic (@anthropicai) 's Twitter Profile Photo

Our interpretability team recently released research that traced the thoughts of a large language model. Now weโ€™re open-sourcing the method. Researchers can generate โ€œattribution graphsโ€ like those in our study, and explore them interactively.

Nathan Brown (@oxxotweets) 's Twitter Profile Photo

Feel like I haven't seen much work on incorporating existing models into VLMs. Seems a bit naive, but I'd imagine incorporating embeddings from depth estimation / segmentation / edge detection models would help remedy the "text first, vision second" issue present in VLMs

Sander Land (@magikarp_tokens) 's Twitter Profile Photo

๐Ÿ” ย UTF-8 was never meant for language models. Yet every major tokenizer still uses it, creating unfair "byte premiums". Why should your native script cost more to tokenize? It's time for a change. ๐Ÿงต๐Ÿ‘‡

๐Ÿ” ย UTF-8 was never meant for language models.
Yet every major tokenizer still uses it, creating unfair "byte premiums".
Why should your native script cost more to tokenize? It's time for a change. ๐Ÿงต๐Ÿ‘‡
Qingxiu Dong (@qx_dong) 's Twitter Profile Photo

โฐ We introduce Reinforcement Pre-Training (RPT๐Ÿ’) โ€” reframing next-token prediction as a reasoning task using RLVR โœ… General-purpose reasoning ๐Ÿ“‘ Scalable RL on web corpus ๐Ÿ“ˆ Stronger pre-training + RLVR results ๐Ÿš€ Allow allocate more compute on specific tokens

โฐ We introduce Reinforcement Pre-Training (RPT๐Ÿ’)  

 โ€” reframing next-token prediction as a reasoning task using RLVR  

โœ… General-purpose reasoning 
๐Ÿ“‘ Scalable RL on web corpus
๐Ÿ“ˆ Stronger pre-training + RLVR results
๐Ÿš€ Allow allocate more compute on specific tokens
ARC Prize (@arcprize) 's Twitter Profile Photo

After the o3 price reduction, we retested the o3-2025-04-16 model on ARC-AGI to determine whether its performance had changed. We compared the retest results with the original results and observed no difference in performance.

Phillip Isola (@phillip_isola) 's Twitter Profile Photo

Our computer vision textbook is now available for free online here: visionbook.mit.edu We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

Nathan Brown (@oxxotweets) 's Twitter Profile Photo

Really like the approach of treating multitude training as a series of database transactions+rollbacks. Makes intuitive sense, surprised I havenโ€™t seen this discussed elsewhere in the OSS ML space as of yet

jack morris (@jxmnop) 's Twitter Profile Photo

In the beginning, there was BERT. Eventually BERT gave rise to RoBERTa. Then, DeBERTa. Later, ModernBERT. And now, NeoBERT. The new state-of-the-art small-sized encoder:

In the beginning, there was BERT.

Eventually BERT gave rise to RoBERTa.  Then, DeBERTa.  Later, ModernBERT.

And now, NeoBERT.  The new state-of-the-art small-sized encoder:
Nathan Brown (@oxxotweets) 's Twitter Profile Photo

Providing a YouTube URL to Google AI Studio w/ Gemini 2.5 Pro, seeing significant spikes in GPU utilization while I wait on a model response (usage drop is when the window is no longer active). Anyone familiar with this? Unsure why any of this would be client-side.

Providing a YouTube URL to Google AI Studio w/ Gemini 2.5 Pro, seeing significant spikes in GPU utilization while I wait on a model response (usage drop is when the window is no longer active). Anyone familiar with this? Unsure why any of this would be client-side.
Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. ๐Ÿงต

New paper & surprising result.
LLMs transmit traits to other models via hidden signals in data.
Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. ๐Ÿงต