Jean Mercat (@mercatjean) 's Twitter Profile
Jean Mercat

@mercatjean

ID: 1051030540911071232

calendar_today13-10-2018 08:43:02

109 Tweet

51 Followers

188 Following

Achal Dave (@achalddave) 's Twitter Profile Photo

Excited to share our new-and-improved 1B models trained with DataComp-LM! - 1.4B model trained on 4.3T tokens - 5-shot MMLU 47.5 (base model) => 51.4 (w/ instruction tuning) - Fully open models: public code, weights, dataset!

Excited to share our new-and-improved 1B models trained with DataComp-LM!

- 1.4B model trained on 4.3T tokens
- 5-shot MMLU 47.5 (base model) => 51.4 (w/ instruction tuning)
- Fully open models: public code, weights, dataset!
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

github.com/mlfoundations/… I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training

github.com/mlfoundations/…
I’m excited to introduce Evalchemy 🧪, a unified platform for evaluating LLMs. If you want to evaluate an LLM, you may want to run popular benchmarks on your model, like MTBench, WildBench, RepoBench, IFEval, AlpacaEval etc as well as standard pre-training
Ryan Marten (@ryanmart3n) 's Twitter Profile Photo

Announcing the Open Thoughts project. We are building the best reasoning datasets out in the open. Building off our work with Stratos, today we are releasing OpenThoughts-114k and OpenThinker-7B.

Announcing the Open Thoughts project. We are building the best reasoning datasets out in the open.

Building off our work with Stratos, today we are releasing OpenThoughts-114k and OpenThinker-7B.
Negin Raoof (@neginraoof_) 's Twitter Profile Photo

Want to evaluate your models on reasoning benchmarks? We have integrated many math and coding benchmarks into Evalchemy: AIME24, AMC23, MATH500, LiveCodeBench, GPQA, HumanEvalPlus, MBPPPlus, BigCodeBench, MultiPL-E, and CRUXEval. Further, Evalchemy now supports vLLM and OpenAI,

Want to evaluate your models on reasoning benchmarks? We have integrated many math and coding benchmarks into Evalchemy: AIME24, AMC23, MATH500, LiveCodeBench, GPQA, HumanEvalPlus, MBPPPlus, BigCodeBench, MultiPL-E, and CRUXEval. 

Further, Evalchemy now supports vLLM and OpenAI,
Negin Raoof (@neginraoof_) 's Twitter Profile Photo

Announcing OpenThinker-32B: the best open-data reasoning model distilled from DeepSeek-R1. Our results show that large, carefully curated datasets with verified R1 annotations produce SoTA reasoning models. Our 32B model outperforms all 32B models including

Announcing OpenThinker-32B: the best open-data reasoning model distilled from DeepSeek-R1.
Our results show that large, carefully curated datasets with verified R1 annotations produce SoTA reasoning models. Our 32B model outperforms all 32B models including
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

Pretty happy that our OpenThinker-32B is in no4 position in the General Reasoning Leaderboard. It should also be pointed out which models are open data (post-training data): OpenThinker, LIMO, OpenHermes and DeepScaler.

Pretty happy that our OpenThinker-32B is in no4 position in the General Reasoning Leaderboard. It should also be pointed out which models are open data (post-training data): OpenThinker, LIMO, OpenHermes and DeepScaler.
Sedrick Keh (@sedrickkeh2) 's Twitter Profile Photo

1/ DeepSeek-VL is trained from DeepSeek LLM Qwen-VL is trained from Qwen-7B PaliGemma is trained from Gemma-2B Is this really the best way to train a VLM? What if we had access to model checkpoints -- would it be better to train with images before the LLM fully converges? 🧵

Etash Guha @ ICLR (@etash_guha) 's Twitter Profile Photo

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)
Edouard Leurent (@eleurent) 's Twitter Profile Photo

Excited to share what I've been up to: Gemini Diffusion is FAST! I'm convinced this will revolutionise iterative workflows: refine, get instant feedback, repeat! So proud of what our small team achieved here🪐

Ryan Marten (@ryanmart3n) 's Twitter Profile Photo

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals.

We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
Russ Tedrake (@russtedrake) 's Twitter Profile Photo

The short version is: LBMs work! We see consistent and statistically significant improvements as we increase the amount of pretraining data. But doing the science is still hard; as a field we have more work to do to improve the statistical power of our experiments.

Zubair Irshad (@mzubairirshad) 's Twitter Profile Photo

🚀Thrilled to share what we’ve been building at TRI over the past several months: our first Large Behavior Models (LBMs) are here! I’m proud to have been a core contributor to the multi-task policy learning and post-training efforts. At TRI, we’ve been researching how LBMs can

Sedrick Keh (@sedrickkeh2) 's Twitter Profile Photo

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀

OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.