Jonathan Hayase (@jonathanhayase) 's Twitter Profile
Jonathan Hayase

@jonathanhayase

5th year Machine Learning PhD student at UW CSE

ID: 1270435381872193537

linkhttps://jon.jon.ke calendar_today09-06-2020 19:27:31

22 Tweet

137 Followers

116 Following

Samuel "curry-howard fanboi" Ainsworth (@samuelainsworth) 's Twitter Profile Photo

📜🚨📜🚨 NN loss landscapes are full of permutation symmetries, ie. swap any 2 units in a hidden layer. What does this mean for SGD? Is this practically useful? For the past 5 yrs these Qs have fascinated me. Today, I am ready to announce "Git Re-Basin"! arxiv.org/abs/2209.04836

Gabriel Ilharco (@gabriel_ilharco) 's Twitter Profile Photo

Introducing DataComp, a new benchmark for multimodal datasets! We release 12.8B image-text pairs, 300+ experiments and a 1.4B subset that outcompetes compute-matched CLIP runs from OpenAI & LAION 📜 arxiv.org/abs/2304.14108 🖥️ github.com/mlfoundations/… 🌐 datacomp.ai

Introducing DataComp, a new benchmark for multimodal datasets! 

We release 12.8B image-text pairs, 300+ experiments and a 1.4B subset that outcompetes compute-matched CLIP runs from OpenAI & LAION

📜 arxiv.org/abs/2304.14108
🖥️ github.com/mlfoundations/…
🌐 datacomp.ai
Gabriel Ilharco (@gabriel_ilharco) 's Twitter Profile Photo

Today we are releasing a CLIP ViT-L/14 model with 79.2% zero-shot accuracy on ImageNet. Our model outperforms OpenAI's CLIP by a large margin, and outperforms even bigger models (ViT-g/14) trained on LAION-2B Check it out at huggingface.co/laion/CLIP-ViT…!

Alisa Liu (@alisawuffles) 's Twitter Profile Photo

What do BPE tokenizers reveal about their training data?🧐 We develop an attack🗡️ that uncovers the training data mixtures📊 of commercial LLM tokenizers (incl. GPT-4o), using their ordered merge lists! Co-1⃣st Jonathan Hayase arxiv.org/abs/2407.16607 🧵⬇️

What do BPE tokenizers reveal about their training data?🧐

We develop an attack🗡️ that uncovers the training data mixtures📊 of commercial LLM tokenizers (incl. GPT-4o), using their ordered merge lists!

Co-1⃣st <a href="/JonathanHayase/">Jonathan Hayase</a>
arxiv.org/abs/2407.16607 🧵⬇️
Jonathan Hayase (@jonathanhayase) 's Twitter Profile Photo

Tokenizers and autogregressive LMs are both trained to compress text, but tokenizer training is deterministic and we know exactly how it works! This makes inverse problems wrt the data much easier. There's a wealth of info lurking in public tokenizers waiting to be extracted!

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

We are just a few days away from a presentation from Alisa Liu presentation on "Data Mixture Inference: What do BPE Tokenizers reveal about their training data?" 🤗 Check it out Aug 19th! Learn more: cohere.com/events/cohere-…

We are just a few days away from a presentation from <a href="/alisawuffles/">Alisa Liu</a> presentation on "Data Mixture Inference: What do BPE Tokenizers reveal about their training data?" 🤗

Check it out Aug 19th!

Learn more: cohere.com/events/cohere-…
Alisa Liu (@alisawuffles) 's Twitter Profile Photo

excited to be at #NeurIPS2024! I'll be presenting our data mixture inference attack 🗓️Thu 4:30pm w/ Jonathan Hayase — stop by to learn what trained tokenizers reveal about LLM development and chat about all things tokenizers.😊

Anshul Nasery (@anshulnasery) 's Twitter Profile Photo

Model merging is a great way to combine multiple models' abilities, however, existing methods only work with models fine-tuned from the same initialization, and produce models of the same size. Our new work - PLeaS (at #CVPR2025) aims to resolve both these issues 🧵.

Model merging is a great way to combine multiple models' abilities, however, existing methods only work with models fine-tuned from the same initialization, and produce models of the same size. Our new work - PLeaS (at #CVPR2025) aims to resolve both these issues 🧵.
Jonathan Hayase (@jonathanhayase) 's Twitter Profile Photo

Tokenizers govern the allocation of computation. It's a waste to spend a whole token of compute predicting the "way" in "By the way". SuperBPE redirects that compute to predict more difficult tokens, leading to wins on downstream tasks!