OLMo is here! And it’s 100% open.
It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here:
blog.allenai.org/olmo-open-lang…
Allen AI presents Dolma
an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
paper page: huggingface.co/papers/2402.00…
dataset: huggingface.co/datasets/allen…
release Dolma, a three trillion tokens English corpus, built from a diverse mixture of web content,
1/5 🧠 Excited to share our latest paper focusing on the heart of LLM training: data curation! We train a 7B LLM achieving 64% on 5-shot MMLU, using only 2.6T tokens. The key to this performance? Exceptional data curation. #LLM #DataCuration
1/7 🚨 What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning. 🧵👇
Mor GevaJonathan BerantOri Yoran
Working on a new web agent? AssistantBench, our benchmark with realistic and challenging web tasks such just got an update:
* Our SeePlanAct Agent with Sonnet 3.5 achieved a new SoTA of 26.4%.
* We just open sourced our agent.
* Accepted to #EMNLP2024!
I will be presenting SUPER next week at EMNLP, Tuesday 4pm. Stop by to talk about evaluating agents on running research experiments and code in-the-wild!
New #ICLR2024 paper!
The KoLMogorov Test: can CodeLMs compress data by code generation?
The optimal compression for a sequence is the shortest program that generates it. Empirically, LMs struggle even on simple sequences, but can be trained to outperform current methods!
🧵1/7
🔭 Science relies on shared artifacts collected for the common good.
🛰 So we asked: what's missing in open language modeling?
🪐 DataDecide 🌌 charts the cosmos of pretraining—across scales and corpora—at a resolution beyond any public suite of models that has come before.
We released a massive suite of 30K checkpoints to help facilitate research into pretraining data decisions! We include insights on what evaluation choices (metrics, benchmarks) can track progress, with comparison to existing methods.
Check out DataDecide! 🔮🥇🥈🥉