Eric Frankel (@esfrankel) 's Twitter Profile
Eric Frankel

@esfrankel

f-divergence enthusiast | phd @uwcse, mlr @apple | prev. math + stats @stanford

ID: 1334248878158176258

calendar_today02-12-2020 21:32:00

122 Tweet

258 Followers

704 Following

Gavin Brown (@gavinrbrown1) 's Twitter Profile Photo

It doesn't matter whether MMD stands for "max mean discrepancy" or "mean max discrepancy," as the meany-max theorem implies they are equivalent.

Anshul Nasery (@anshulnasery) 's Twitter Profile Photo

Model merging is a great way to combine multiple models' abilities, however, existing methods only work with models fine-tuned from the same initialization, and produce models of the same size. Our new work - PLeaS (at #CVPR2025) aims to resolve both these issues 🧵.

Model merging is a great way to combine multiple models' abilities, however, existing methods only work with models fine-tuned from the same initialization, and produce models of the same size. Our new work - PLeaS (at #CVPR2025) aims to resolve both these issues 🧵.
Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile Photo

Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]

Teknium (e/λ) (@teknium1) 's Twitter Profile Photo

.Mistral AI just released a new version of their 24B model - this time is multimodal and has 128K context - exactly what we wanted! This enables the reasoning models to be fully exploited on both long reasoning and vision tasks. They also gave DeepHermes a shoutout!

.<a href="/MistralAI/">Mistral AI</a> just released a new version of their 24B model - this time is multimodal and has 128K context - exactly what we wanted! 

This enables the reasoning models to be fully exploited on both long reasoning and vision tasks.

They also gave DeepHermes a shoutout!
Joel Jang (@jang_yoel) 's Twitter Profile Photo

Excited to release GR00T N1! While this robot foundation model already stands out as the first open-source foundation model for humanoids and for its utilization of 540k simulation trajectories during pretraining, I want to highlight two other key innovations that truly set it

Alisa Liu (@alisawuffles) 's Twitter Profile Photo

We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵

We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words.

When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵
Sewoong Oh (@sewoong79) 's Twitter Profile Photo

We are releasing OpenDeepSearch (ODS), an open-source search agent that works with any LLM. When paired with DeepSeek-R1, ODS outperforms OpenAI’s specialized model for web search, GPT-4o-Search, on the challenging, multi-hop FRAMES benchmark from DeepMind (+9.7% accuracy).

We are releasing OpenDeepSearch (ODS), an open-source search agent that works with any LLM. When paired with DeepSeek-R1, ODS outperforms OpenAI’s specialized model for web search, GPT-4o-Search, on the challenging, multi-hop FRAMES benchmark from DeepMind (+9.7% accuracy).
Etash Guha @ ICLR (@etash_guha) 's Twitter Profile Photo

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)
Gonçalo Faria (@goncalorafaria) 's Twitter Profile Photo

Introducing 𝗤𝗔𝗹𝗶𝗴𝗻🚀, a 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝗺𝗲𝘁𝗵𝗼𝗱 that improves language model performance using Markov chain Monte Carlo. With no model retraining, 𝗤𝗔𝗹𝗶𝗴𝗻 outperforms DPO-tuned models even when allowed to match inference compute, and achieves

Introducing 𝗤𝗔𝗹𝗶𝗴𝗻🚀, a 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝗺𝗲𝘁𝗵𝗼𝗱 that improves language model performance using Markov chain Monte Carlo. 
With no model retraining, 𝗤𝗔𝗹𝗶𝗴𝗻 outperforms DPO-tuned models even when allowed to match inference compute, and achieves
Ai2 (@allen_ai) 's Twitter Profile Photo

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared.
DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets &amp; 10 benchmarks 🧵
Ian Magnusson (@ianmagnusson) 's Twitter Profile Photo

🔭 Science relies on shared artifacts collected for the common good. 🛰 So we asked: what's missing in open language modeling? 🪐 DataDecide 🌌 charts the cosmos of pretraining—across scales and corpora—at a resolution beyond any public suite of models that has come before.

Rohan Baijal (@rohanbaijal) 's Twitter Profile Photo

Long Range Navigator (LRN) 🧭— an approach to extend planning horizons for off-road navigation given no prior maps. Using vision LRN makes longer-range decisions by spotting navigation frontiers far beyond the range of metric maps. personalrobotics.github.io/lrn/

Kunal Jha (@kjha02) 's Twitter Profile Photo

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity &gt; Partner Diversity.

Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks.

shorturl.at/fqsNN🧵
Rui Xin (@rui_xin31) 's Twitter Profile Photo

Think PII scrubbing ensures privacy? 🤔Think again‼️ In our paper, for the first time on unstructured text, we show that you can re-identify over 70% of private information *after* scrubbing! It’s time to move beyond surface-level anonymization. #Privacy #NLProc 🔗🧵

Think PII scrubbing ensures privacy? 🤔Think again‼️ In our paper, for the first time on unstructured text, we show that you can re-identify over 70% of private information *after* scrubbing! It’s time to move beyond surface-level anonymization. #Privacy #NLProc 🔗🧵
Tong Chen @ ICLR (@tomchen0) 's Twitter Profile Photo

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data. 🛠️ No changes to pre-training or decoding 🔥 Training models to latently distinguish between memorized

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data.
🛠️ No changes to pre-training or decoding
🔥 Training models to latently distinguish between memorized
Siting Li (@sitingli627) 's Twitter Profile Photo

Excited to share that our paper "Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder" is accepted to #ACL2025! Preprint: arxiv.org/pdf/2411.05195 Thank Simon Shaolei Du and Pang Wei Koh so much for your support and guidance throughout the journey!

Stella Li (@stellalisy) 's Twitter Profile Photo

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

🤯 We cracked RLVR with... Random Rewards?!
Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:
- Random rewards: +21%
- Incorrect rewards: +25%
- (FYI) Ground-truth rewards: + 28.8%
How could this even work⁉️ Here's why: 🧵
Blogpost: tinyurl.com/spurious-rewar…
Yizhong Wang (@yizhongwyz) 's Twitter Profile Photo

Thrilled to announce that I will be joining UT Austin Computer Science at UT Austin as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

Thrilled to announce that I will be joining <a href="/UTAustin/">UT Austin</a> <a href="/UTCompSci/">Computer Science at UT Austin</a> as an assistant professor in fall 2026! 

I will continue working on language models, data challenges, learning paradigms, &amp; AI for innovation. Looking forward to teaming up with new students &amp; colleagues! 🤠🤘