Xinyu Yang (@xinyu2ml) 's Twitter Profile
Xinyu Yang

@xinyu2ml

Ph.D. @CarnegieMellon. Working on principled algorithm & system co-design for scalable and generalizable foundation models. he/they. A fan of TileLang!!!

ID: 1601134489161400321

linkhttps://xinyuyang.me/ calendar_today09-12-2022 08:40:04

146 Tweet

530 Followers

606 Following

Huaxiu Yao✈️ICLR 2025🇸🇬 (@huaxiuyaoml) 's Twitter Profile Photo

❗️Self-evolution is quietly pushing LLM agents off the rails. ⚠️ Even perfect alignment at deployment can gradually forget human alignment and shift toward self-serving strategies. Over time, LLM agents stop following values, imitate bad strategies, and even spread misaligned

Shizhe Diao (@shizhediao) 's Twitter Profile Photo

✨ We’re hiring interns at NVIDIA Research! Our team works on efficient agentic systems, new model architectures, multi-modal models and post-training optimization. If interested, please send your CV to [email protected] 🚀 #hiring #internship

Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

Hi everyone! This Thursday, we will host the second NLP Seminar of the year! For this week's seminar, we are excited to host Tianyu Gao (Tianyu Gao) from OpenAI and UC San Diego (UCSD)! If you are interested in attending remotely, here is the Zoom link:

Hi everyone!   This Thursday, we will host the second NLP Seminar of the year! For this week's seminar, we are excited to host Tianyu Gao (<a href="/gaotianyu1350/">Tianyu Gao</a>) from OpenAI and UC San Diego (UCSD)!  If you are interested in attending remotely, here is the Zoom link:
VraserX e/acc (@vraserx) 's Twitter Profile Photo

A 7 million parameter model from Samsung just outperformed DeepSeek-R1, Gemini 2.5 Pro, and o3-mini on reasoning benchmarks like ARC-AGI. Let that sink in. It’s 10,000x smaller yet smarter. The secret is recursion. Instead of brute-forcing answers like giant LLMs, it drafts a

A 7 million parameter model from Samsung just outperformed DeepSeek-R1, Gemini 2.5 Pro, and o3-mini on reasoning benchmarks like ARC-AGI.

Let that sink in.
It’s 10,000x smaller yet smarter.

The secret is recursion.
Instead of brute-forcing answers like giant LLMs, it drafts a
Andrew Campbell (@andrewc_ml) 's Twitter Profile Photo

Very excited to share our preprint: Self-Speculative Masked Diffusions We speed up sampling of masked diffusion models by ~2x by using speculative sampling and a hybrid non-causal / causal transformer arxiv.org/abs/2510.03929 w/ Valentin De Bortoli Jiaxin Shi Arnaud Doucet

Very excited to share our preprint: Self-Speculative Masked Diffusions

We speed up sampling of masked diffusion models by ~2x by using speculative sampling and a hybrid non-causal / causal transformer

arxiv.org/abs/2510.03929

w/ <a href="/ValentinDeBort1/">Valentin De Bortoli</a> <a href="/thjashin/">Jiaxin Shi</a> <a href="/ArnaudDoucet1/">Arnaud Doucet</a>
Jiawei Zhao (@jiawzhao) 's Twitter Profile Photo

We’ve always assumed stale and off-policy data hurts RL a lot — but our latest work shows the opposite. 🧠 M2PO (Second-Moment Trust Policy Optimization) reveals that even data stale by 256 model updates can train LLMs as effectively as on-policy RL, unlocking scalable and

Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

Code for 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥-𝐑𝟏 is live! 👉 github.com/zhengkid/Paral… (now 189 stars and climbing 🔥) It lets LLMs think in parallel — multiple reasoning paths, smarter synthesis, more creative inference! Miss this paper and you’re missing a leap forward: arxiv.org/abs/2509.07980

Code for 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥-𝐑𝟏 is live! 👉 github.com/zhengkid/Paral…
(now 189 stars and climbing 🔥)

It lets LLMs think in parallel — multiple reasoning paths, smarter synthesis, more creative inference!

Miss this paper and you’re missing a leap forward: arxiv.org/abs/2509.07980
Jonas Geiping (@jonasgeiping) 's Twitter Profile Photo

What determines how easy it is to quantize an LLM after training? Thanks to a number of recent open-source training trajectories, we were able to show much more directly how trainining hyperparameters modulate quantization errors, for good and bad. More details below:

What determines how easy it is to quantize an LLM after training? 

Thanks to a number of recent open-source training trajectories, we were able to show much more directly how trainining hyperparameters modulate quantization errors, for good and bad. More details below:
Tong Zheng (@zhengtoong) 's Twitter Profile Photo

Parallel Reasoning has entered the AI mainstream. Inspired by works like Gemini 2.5 Pro, APR Jiayi Pan Xiuyu Li Long Lian , and Multiverse Xinyu Yang , our Parallel-R1 establishes the first reinforcement-learning framework that moves this paradigm beyond synthetic tasks,

Parallel Reasoning has entered the AI mainstream. Inspired by works like Gemini 2.5 Pro, APR <a href="/jiayi_pirate/">Jiayi Pan</a> <a href="/xiuyu_l/">Xiuyu Li</a>  <a href="/LongTonyLian/">Long Lian</a> , and Multiverse <a href="/Xinyu2ML/">Xinyu Yang</a> , our Parallel-R1 establishes the first reinforcement-learning framework that moves this paradigm beyond synthetic tasks,
Kangwook Lee (@kangwook_lee) 's Twitter Profile Photo

DLLMs seem promising... but parallel generation is not always possible Diffusion-based LLMs can generate many tokens at different positions at once, while most autoregressive LLMs generate tokens one by one. This makes diffusion-based LLMs highly attractive when we need fast

DLLMs seem promising... but parallel generation is not always possible

Diffusion-based LLMs can generate many tokens at different positions at once, while most autoregressive LLMs generate tokens one by one.

This makes diffusion-based LLMs highly attractive when we need fast
Yufan Zhuang (@yufan_zhuang) 's Twitter Profile Photo

Can LLMs reason beyond context limits? 🤔 Introducing Knowledge Flow, a training-free method that helped gpt-oss-120b & Qwen3-235B achieve 100% on the AIME-25, no tools. How? like human deliberation, for LLMs. 📝 Blog: yufanzhuang.notion.site/knowledge-flow 💻 Code: github.com/EvanZhuang/kno…

Can LLMs reason beyond context limits? 🤔 

Introducing Knowledge Flow, a training-free method that helped gpt-oss-120b &amp; Qwen3-235B achieve 100% on the AIME-25, no tools.

How? like human deliberation, for LLMs.

📝 Blog: yufanzhuang.notion.site/knowledge-flow
💻 Code: github.com/EvanZhuang/kno…
Shanli Xing (@0xsling0) 's Twitter Profile Photo

🤔 Can AI optimize the systems it runs on? 🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents: - Standardized signature for LLM serving kernels - Implement kernels with your preferred language - Benchmark them against real-world serving

🤔 Can AI optimize the systems it runs on?

🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents:

- Standardized signature for LLM serving kernels
- Implement kernels with your preferred language
- Benchmark them against real-world serving
Xinyu Yang (@xinyu2ml) 's Twitter Profile Photo

🏆Honored to share that LLM.265 (dl.acm.org/doi/10.1145/37…) received the Best Paper Award at MICRO 2025! 🥳Huge thanks to the whole team! 😅Accidentally deleted the original tweet—posting it again

🏆Honored to share that LLM.265 (dl.acm.org/doi/10.1145/37…) received the Best Paper Award at MICRO 2025!   

🥳Huge thanks to the whole team!

😅Accidentally deleted the original tweet—posting it again