Yujin Kim (@yujin301300) 's Twitter Profile
Yujin Kim

@yujin301300

ID: 1581187924993142784

calendar_today15-10-2022 07:39:29

5 Tweet

17 Followers

75 Following

Xiaotian (Max) Han (@xiaotianhan1) 's Twitter Profile Photo

📢 [New Research] Introducing Speculative Thinking—boosting small LLMs by leveraging large-model mentorship. Why? - Small models generate overly long responses, especially when incorrect. - Large models offer concise, accurate reasoning patterns. - Wrong reasoning (thoughts) is

📢 [New Research] Introducing Speculative Thinking—boosting small LLMs by leveraging large-model mentorship.

Why?
- Small models generate overly long responses, especially when incorrect.
- Large models offer concise, accurate reasoning patterns.
- Wrong reasoning (thoughts) is
ℏεsam (@hesamation) 's Twitter Profile Photo

a new article just dropped on "the state of LLM reasoning models". if you hear about test-time compute a lot, but don't actually know what it is, this is a great article. Sebastian Raschka covered 12 of the major papers in test-time compute.

a new article just dropped on "the state of LLM reasoning models". 

if you hear about test-time compute a lot, but don't actually know what it is, this is a great article.

<a href="/rasbt/">Sebastian Raschka</a> covered 12 of the major papers in test-time compute.
Gradio (@gradio) 's Twitter Profile Photo

🚀 New Research: Self-training inspires clear and concise thinking in LLMs! Paper achieves a 30% reduction in output tokens across five model families on GSM8K and MATH while maintaining average accuracy 👀

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Small language models struggle with complex reasoning tasks where large models excel. This paper introduces the SMART framework, where a small model performs reasoning but selectively requests corrections from a large model only for steps identified as uncertain via a scoring

Small language models struggle with complex reasoning tasks where large models excel.

This paper introduces the SMART framework, where a small model performs reasoning but selectively requests corrections from a large model only for steps identified as uncertain via a scoring
Deedy (@deedydas) 's Twitter Profile Photo

Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.

Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions.

It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read.

Has potential to be a Transformers killer.
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

This is quite a landmark paper from Google DeepMind 📌 2x faster inference because tokens exit the shared loop early. 📌 During training it cuts the heavy math, dropping attention FLOPs per layer by about half, so the same budget trains on more data. Shows a fresh way to

This is quite a landmark paper from <a href="/GoogleDeepMind/">Google DeepMind</a> 

📌 2x faster inference because tokens exit the shared loop early.

📌 During training it cuts the heavy math, dropping attention FLOPs per layer by about half, so the same budget trains on more data.

Shows a fresh way to
alphaXiv (@askalphaxiv) 's Twitter Profile Photo

"experts" for harder tokens? "Mixture-of-Recursions (MoR): Learning Dynamic Recursive Depths for Adaptive Token-Level Computation" MoR makes one shared Transformer block loop only for tokens that need extra thought, delivering quality with half the weights & twice the speed

"experts" for harder tokens?

"Mixture-of-Recursions (MoR): Learning Dynamic Recursive Depths for Adaptive Token-Level Computation"

MoR makes one shared Transformer block loop only for tokens that need extra thought, delivering quality with half the weights &amp; twice the speed
The AI Timeline (@theaitimeline) 's Twitter Profile Photo

🚨This week's top AI/ML research papers: - Mixture-of-Recursions - Scaling Laws for Optimal Data Mixtures - Training Transformers with Enforced Lipschitz Constants - Reasoning or Memorization? - How Many Instructions Can LLMs Follow at Once? - Chain of Thought Monitorability -

🚨This week's top AI/ML research papers:

- Mixture-of-Recursions
- Scaling Laws for Optimal Data Mixtures
- Training Transformers with Enforced Lipschitz Constants
- Reasoning or Memorization?
- How Many Instructions Can LLMs Follow at Once?
- Chain of Thought Monitorability
-
Sangmin Bae (@raymin0223) 's Twitter Profile Photo

✨Huge thanks for interest in Mixture-of-Recursions! Codes are officially out! It's been a long journey exploring Early-exiting with Recursive Architecture. I'll soon post my 👨‍🎓PhD thesis on Adaptive Computation too! Code: github.com/raymin0223/mix… Paper: arxiv.org/abs/2507.10524

✨Huge thanks for interest in Mixture-of-Recursions! Codes are officially out!

It's been a long journey exploring Early-exiting with Recursive Architecture.
I'll soon post my 👨‍🎓PhD thesis on Adaptive Computation too!

Code: github.com/raymin0223/mix…
Paper: arxiv.org/abs/2507.10524