Jean Mercat (@mercatjean) Twitter Tweets • TwiCopy

Achal Dave

a year ago

Excited to share our new-and-improved 1B models trained with DataComp-LM! - 1.4B model trained on 4.3T tokens - 5-shot MMLU 47.5 (base model) => 51.4 (w/ instruction tuning) - Fully open models: public code, weights, dataset!

thumb_up_off_alt112

chat_bubble_outline3

repeat29

shareShare

Jean Mercat

@mercatjean

a year ago

Incredible work saving thousands of GPU hours. And all of that in a short and very readable code.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Alex Dimakis

@alexgdimakis

10 months ago

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training

thumb_up_off_alt242

chat_bubble_outline9

repeat41

shareShare

Ryan Marten

@ryanmart3n

7 months ago

Announcing the Open Thoughts project. We are building the best reasoning datasets out in the open. Building off our work with Stratos, today we are releasing OpenThoughts-114k and OpenThinker-7B.

thumb_up_off_alt384

chat_bubble_outline15

repeat63

shareShare

Negin Raoof

@neginraoof_

7 months ago

Want to evaluate your models on reasoning benchmarks? We have integrated many math and coding benchmarks into Evalchemy: AIME24, AMC23, MATH500, LiveCodeBench, GPQA, HumanEvalPlus, MBPPPlus, BigCodeBench, MultiPL-E, and CRUXEval. Further, Evalchemy now supports vLLM and OpenAI,

thumb_up_off_alt44

chat_bubble_outline1

repeat16

shareShare

Negin Raoof

@neginraoof_

7 months ago

Announcing OpenThinker-32B: the best open-data reasoning model distilled from DeepSeek-R1. Our results show that large, carefully curated datasets with verified R1 annotations produce SoTA reasoning models. Our 32B model outperforms all 32B models including

thumb_up_off_alt770

chat_bubble_outline12

repeat128

shareShare

Alex Dimakis

@alexgdimakis

6 months ago

Pretty happy that our OpenThinker-32B is in no4 position in the General Reasoning Leaderboard. It should also be pointed out which models are open data (post-training data): OpenThinker, LIMO, OpenHermes and DeepScaler.

thumb_up_off_alt125

chat_bubble_outline4

repeat16

shareShare

Sedrick Keh

@sedrickkeh2

6 months ago

1/ DeepSeek-VL is trained from DeepSeek LLM Qwen-VL is trained from Qwen-7B PaliGemma is trained from Gemma-2B Is this really the best way to train a VLM? What if we had access to model checkpoints -- would it be better to train with images before the LLM fully converges? 🧵

thumb_up_off_alt32

chat_bubble_outline5

repeat10

shareShare

Etash Guha @ ICLR

@etash_guha

5 months ago

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)

thumb_up_off_alt465

chat_bubble_outline19

repeat171

shareShare

Edouard Leurent

@eleurent

4 months ago

Excited to share what I've been up to: Gemini Diffusion is FAST! I'm convinced this will revolutionise iterative workflows: refine, get instant feedback, repeat! So proud of what our small team achieved here🪐

thumb_up_off_alt122

chat_bubble_outline5

repeat18

shareShare

Ryan Marten

@ryanmart3n

3 months ago

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

thumb_up_off_alt880

chat_bubble_outline27

repeat181

shareShare

Russ Tedrake

@russtedrake

2 months ago

The short version is: LBMs work! We see consistent and statistically significant improvements as we increase the amount of pretraining data. But doing the science is still hard; as a field we have more work to do to improve the statistical power of our experiments.

thumb_up_off_alt20

chat_bubble_outline1

repeat1

shareShare

Zubair Irshad

@mzubairirshad

2 months ago

🚀Thrilled to share what we’ve been building at TRI over the past several months: our first Large Behavior Models (LBMs) are here! I’m proud to have been a core contributor to the multi-task policy learning and post-training efforts. At TRI, we’ve been researching how LBMs can

thumb_up_off_alt185

chat_bubble_outline3

repeat29

shareShare

Sedrick Keh

@sedrickkeh2

2 months ago

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

thumb_up_off_alt110

chat_bubble_outline1

repeat30

shareShare