Jonathan Hayase (@jonathanhayase) Twitter Tweets • TwiCopy

Jonathan Hayase

@jonathanhayase

+ Follow

5th year Machine Learning PhD student at UW CSE

ID: 1270435381872193537

linkhttps://jon.jon.ke calendar_today09-06-2020 19:27:31

22 Tweet

137 Followers

116 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Samuel "curry-howard fanboi" Ainsworth

@samuelainsworth

3 years ago

📜🚨📜🚨 NN loss landscapes are full of permutation symmetries, ie. swap any 2 units in a hidden layer. What does this mean for SGD? Is this practically useful? For the past 5 yrs these Qs have fascinated me. Today, I am ready to announce "Git Re-Basin"! arxiv.org/abs/2209.04836

thumb_up_off_alt2,2K

chat_bubble_outline61

repeat571

shareShare

Gabriel Ilharco

@gabriel_ilharco

2 years ago

Introducing DataComp, a new benchmark for multimodal datasets! We release 12.8B image-text pairs, 300+ experiments and a 1.4B subset that outcompetes compute-matched CLIP runs from OpenAI & LAION 📜 arxiv.org/abs/2304.14108 🖥️ github.com/mlfoundations/… 🌐 datacomp.ai

thumb_up_off_alt760

chat_bubble_outline8

repeat184

shareShare

Gabriel Ilharco

@gabriel_ilharco

2 years ago

Today we are releasing a CLIP ViT-L/14 model with 79.2% zero-shot accuracy on ImageNet. Our model outperforms OpenAI's CLIP by a large margin, and outperforms even bigger models (ViT-g/14) trained on LAION-2B Check it out at huggingface.co/laion/CLIP-ViT…!

thumb_up_off_alt719

chat_bubble_outline16

repeat132

shareShare

Sewoong Oh

@sewoong79

a year ago

Congratulations to Jonathan Hayase on winning the 2024 ICML Best Paper Award for his work during his Google internship.

thumb_up_off_alt21

chat_bubble_outline0

repeat1

shareShare

Alisa Liu

@alisawuffles

a year ago

What do BPE tokenizers reveal about their training data?🧐 We develop an attack🗡️ that uncovers the training data mixtures📊 of commercial LLM tokenizers (incl. GPT-4o), using their ordered merge lists! Co-1⃣st Jonathan Hayase arxiv.org/abs/2407.16607 🧵⬇️

thumb_up_off_alt365

chat_bubble_outline12

repeat68

shareShare

Jonathan Hayase

@jonathanhayase

a year ago

Tokenizers and autogregressive LMs are both trained to compress text, but tokenizer training is deterministic and we know exactly how it works! This makes inverse problems wrt the data much easier. There's a wealth of info lurking in public tokenizers waiting to be extracted!

thumb_up_off_alt21

chat_bubble_outline0

repeat2

shareShare

Cohere Labs

@cohere_labs

a year ago

We are just a few days away from a presentation from Alisa Liu presentation on "Data Mixture Inference: What do BPE Tokenizers reveal about their training data?" 🤗 Check it out Aug 19th! Learn more: cohere.com/events/cohere-…

We are just a few days away from a presentation from <a href="/alisawuffles/">Alisa Liu</a> presentation on "Data Mixture Inference: What do BPE Tokenizers reveal about their training data?" 🤗

Check it out Aug 19th!

Learn more: cohere.com/events/cohere-…

thumb_up_off_alt37

chat_bubble_outline4

repeat7

shareShare

Alisa Liu

@alisawuffles

7 months ago

excited to be at #NeurIPS2024! I'll be presenting our data mixture inference attack 🗓️Thu 4:30pm w/ Jonathan Hayase — stop by to learn what trained tokenizers reveal about LLM development and chat about all things tokenizers.😊

thumb_up_off_alt73

chat_bubble_outline0

repeat11

shareShare

Anshul Nasery

@anshulnasery

4 months ago

Model merging is a great way to combine multiple models' abilities, however, existing methods only work with models fine-tuned from the same initialization, and produce models of the same size. Our new work - PLeaS (at #CVPR2025) aims to resolve both these issues 🧵.

thumb_up_off_alt88

chat_bubble_outline2

repeat13

shareShare

Jonathan Hayase

@jonathanhayase

4 months ago

Tokenizers govern the allocation of computation. It's a waste to spend a whole token of compute predicting the "way" in "By the way". SuperBPE redirects that compute to predict more difficult tokens, leading to wins on downstream tasks!

thumb_up_off_alt47

chat_bubble_outline1

repeat8

shareShare