Tsendsuren (@tsendeemts) 's Twitter Profile
Tsendsuren

@tsendeemts

Research scientist at Google Deepmind | previously at Microsoft Research and Postdoc at UMass. Views are my own. Most tweets in Mongolian 🇲🇳.

ID: 105984261

linkhttp://www.tsendeemts.com/ calendar_today18-01-2010 03:58:52

21,21K Tweet

4,4K Followers

587 Following

Guillaume Lample @ NeurIPS 2024 (@guillaumelample) 's Twitter Profile Photo

Very excited to release our first reasoning model, Magistral. We released the weights of Magistral Small alongside a paper that presents our approach, online RL infrastructure, and findings.

Very excited to release our first reasoning model, Magistral. We released the weights of Magistral Small alongside a paper that presents our approach, online RL infrastructure, and findings.
Christian Szegedy (@chrszegedy) 's Twitter Profile Photo

The history of AI from 2012 to present shows each paradigm solving previous limitations while revealing new ones. The next state-of-the-art will be Supervised RL for reasoning. It is fundamentally bottlenecked by the need for verifiable environments. 2/7

The history of AI from 2012 to present shows each paradigm solving previous limitations while revealing new ones. The next state-of-the-art will be Supervised RL for reasoning. It is fundamentally bottlenecked by the need for verifiable environments.

2/7
Tsendsuren (@tsendeemts) 's Twitter Profile Photo

Looks interesting! Here is what I did for fast adaptation 5 years ago: arxiv.org/pdf/2009.01803. Curios to see the advancement since then.

Christian Szegedy (@chrszegedy) 's Twitter Profile Photo

The Inception paper arxiv.org/abs/1409.4842 was awarded the Longuet-Higgins prize (Test of time). The architecture represented a significant step forward in inference efficiency especially on CPU and variants of Inception networks were used in Google products for years.

MiniMax (official) (@minimax__ai) 's Twitter Profile Photo

Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning. - World’s longest context window: 1M-token input, 80k-token output - State-of-the-art agentic use among open-source models - RL at unmatched efficiency:

Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning.

- World’s longest context window: 1M-token input, 80k-token output
- State-of-the-art agentic use among open-source models
- RL at unmatched efficiency:
Mongol Tsakhia ELBEGDORJ (@elbegdorj) 's Twitter Profile Photo

Eighty U.S. Senators are backing tariffs on Russian oil buyers. As MONGOLIA, democracy’s the lone outpost, we face a hard truth: landlocked and bordered by two giants. We have no alternative but to import fuel from the north. I hope our unique position earns an understanding and

Eighty U.S. Senators are backing tariffs on Russian oil buyers. As MONGOLIA, democracy’s the lone outpost, we face a hard truth: landlocked and bordered by two giants. We have no alternative but to import fuel from the north. I hope our unique position earns an understanding and
𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8) 's Twitter Profile Photo

𝑨𝒐𝑬 DeepSeek-R1T-Chimera, a 671B hybrid model built by merging only routed experts from DeepSeek-R1 into V3-0324 - No fine-tuning, no distillation - Matches R1 reasoning while using ~40% fewer output tokens (~2.5× more concise) - Fully functional out-of-the-box Method -

𝑨𝒐𝑬

DeepSeek-R1T-Chimera, a 671B hybrid model built by merging only routed experts from DeepSeek-R1 into V3-0324

- No fine-tuning, no distillation
- Matches R1 reasoning while using ~40% fewer output tokens (~2.5× more concise)
- Fully functional out-of-the-box

Method 
-
Tsendsuren (@tsendeemts) 's Twitter Profile Photo

Гүүглээс Gemini 2.5 хиймэл оюуны загвар хөгжүүлэлтийн тухай техникал рефорт гарсан шүү: storage.googleapis.com/deepmind-media…

Tu Vu (@tuvllms) 's Twitter Profile Photo

Excited to share that our paper on model merging at scale has been accepted to Transactions on Machine Learning Research (TMLR). Huge congrats to my intern Prateek Yadav and our awesome co-authors Jonathan Lai, Alexandra Chronopoulou, Manaal Faruqui, Mohit Bansal, and Tsendsuren 🎉!!

Excited to share that our paper on model merging at scale has been accepted to Transactions on Machine Learning Research (TMLR). Huge congrats to my intern <a href="/prateeky2806/">Prateek Yadav</a> and our awesome co-authors <a href="/_JLai/">Jonathan Lai</a>, <a href="/alexandraxron/">Alexandra Chronopoulou</a>, <a href="/manaalfar/">Manaal Faruqui</a>, <a href="/mohitban47/">Mohit Bansal</a>, and <a href="/TsendeeMTS/">Tsendsuren</a> 🎉!!
Tsendsuren (@tsendeemts) 's Twitter Profile Photo

This work got accepted at Transactions on Machine Learning Research (TMLR). Congratulations to Prateek Yadav and my co-authors. Also, thank you to the reviewers and editors for their time.

Oreva Ahia (@orevaahia) 's Twitter Profile Photo

🎉 We’re excited to introduce BLAB: Brutally Long Audio Bench, the first benchmark for evaluating long-form reasoning in audio LMs across 8 challenging tasks, using 833+ hours of Creative Commons audio. (avg length: 51 minutes).

🎉 We’re excited to introduce BLAB: Brutally Long Audio Bench, the first benchmark for evaluating long-form reasoning in audio LMs across 8 challenging tasks, using 833+ hours of Creative Commons audio. (avg length: 51 minutes).
Arion Das || Gen AI Research || LLMs || NLP (@ariondas) 's Twitter Profile Photo

Tsendsuren Prateek Yadav I had tried implementing one of your papers: "Infini Transformer" from scratch: github.com/ArionDas/Infin… But you never shared the code with me when I requested 😄🙌