Ben Bogin (@ben_bogin) 's Twitter Profile
Ben Bogin

@ben_bogin

Research Scientist @ Google

ID: 150610839

calendar_today01-06-2010 10:56:41

149 Tweet

727 Followers

446 Following

Ai2 (@allen_ai) 's Twitter Profile Photo

OLMo is here! And it’s 100% open. It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here: blog.allenai.org/olmo-open-lang…

AK (@_akhaliq) 's Twitter Profile Photo

Allen AI presents Dolma an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research paper page: huggingface.co/papers/2402.00… dataset: huggingface.co/datasets/allen… release Dolma, a three trillion tokens English corpus, built from a diverse mixture of web content,

Allen AI presents Dolma

an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

paper page: huggingface.co/papers/2402.00…

dataset: huggingface.co/datasets/allen…

release Dolma, a three trillion tokens English corpus, built from a diverse mixture of web content,
Maor Ivgi (@maorivg) 's Twitter Profile Photo

1/5 🧠 Excited to share our latest paper focusing on the heart of LLM training: data curation! We train a 7B LLM achieving 64% on 5-shot MMLU, using only 2.6T tokens. The key to this performance? Exceptional data curation. #LLM #DataCuration

Maor Ivgi (@maorivg) 's Twitter Profile Photo

1/7 🚨 What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning. 🧵👇 Mor Geva Jonathan Berant Ori Yoran

1/7 🚨 What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning. 🧵👇
<a href="/megamor2/">Mor Geva</a> <a href="/JonathanBerant/">Jonathan Berant</a> <a href="/OriYoran/">Ori Yoran</a>
Ori Yoran (@oriyoran) 's Twitter Profile Photo

Working on a new web agent? AssistantBench, our benchmark with realistic and challenging web tasks such just got an update: * Our SeePlanAct Agent with Sonnet 3.5 achieved a new SoTA of 26.4%. * We just open sourced our agent. * Accepted to #EMNLP2024!

Working on a new web agent? AssistantBench, our benchmark with realistic and challenging web tasks such just got an update:

* Our SeePlanAct Agent with Sonnet 3.5 achieved a new SoTA of 26.4%. 
* We just open sourced our agent.
* Accepted to #EMNLP2024!
Ben Bogin (@ben_bogin) 's Twitter Profile Photo

I will be presenting SUPER next week at EMNLP, Tuesday 4pm. Stop by to talk about evaluating agents on running research experiments and code in-the-wild!

Ori Yoran (@oriyoran) 's Twitter Profile Photo

New #ICLR2024 paper! The KoLMogorov Test: can CodeLMs compress data by code generation? The optimal compression for a sequence is the shortest program that generates it. Empirically, LMs struggle even on simple sequences, but can be trained to outperform current methods! 🧵1/7

Ian Magnusson (@ianmagnusson) 's Twitter Profile Photo

🔭 Science relies on shared artifacts collected for the common good. 🛰 So we asked: what's missing in open language modeling? 🪐 DataDecide 🌌 charts the cosmos of pretraining—across scales and corpora—at a resolution beyond any public suite of models that has come before.

Tai Nguyen (@taidng) 's Twitter Profile Photo

We released a massive suite of 30K checkpoints to help facilitate research into pretraining data decisions! We include insights on what evaluation choices (metrics, benchmarks) can track progress, with comparison to existing methods. Check out DataDecide! 🔮🥇🥈🥉