Sijun Tan (@sijun_tan) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Check out Michael Luo's latest work, Autellix—an ultra-fast system for serving agentic workloads, achieving 4-15x speedups over vLLM/SGLang! At Agentica, we are committed to building efficient infra for serving/training of LLM agents, and Autellix is the first step towards it!

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

Sijun Tan

@sijun_tan

5 months ago

Exciting to see the community successfully reproducing DeepScaleR’s results—this is the true power of open-source! By sharing everything openly, we enable faster progress and collective innovation. Let's build together!!

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Sijun Tan

@sijun_tan

5 months ago

Quoting this legendary Apple Ad that I think best encapsulates the spirit of Agency: "Here's to the crazy ones. The misfits. The rebels. The troublemakers. The round pegs in the square holes. The ones who see things differently. They're not fond of rules. And they have no

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Michael Luo

@michaelzluo

4 months ago

🚀 We introduce DeepCoder-14B-Preview, a fully open-sourced coding model that is on par with o3-mini and o1! 📷 We scaled our model with RL magic up to 32K context. It's performance scales to 64K context 🔥

thumb_up_off_alt111

chat_bubble_outline9

repeat15

shareShare

Sijun Tan

@sijun_tan

4 months ago

Hey Sam Altman, we know you're planning to open-source your reasoning model—but we couldn’t wait. Introducing DeepCoder-14B-Preview: a fully open-source reasoning model that matches o1 and o3-mini on both coding and math. And yes, we’re releasing everything: model, data, code, and

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat149

shareShare

Chan Kha Vu 🇺🇦🌻🚜

@chankhavu

4 months ago

Babe, wake up, we have o3-mini at home 🤯 And, as usual, a Notion post instead of ArXiv paper. Peak Alpha energy 🫡 There are so many good engineering bits in this report 😍

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare

Naman Jain @ ICLR

@stringchaos

4 months ago

Excited to release R2E-Gym - 🔥 8.1K executable environments using synthetic data - 🧠 Hybrid verifiers for enhanced inference-time scaling - 📈 51% success-rate on the SWE-Bench Verified - 🤗 Open Source Data + Models + Trajectories 1/

thumb_up_off_alt259

chat_bubble_outline15

repeat62

shareShare

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxestex

4 months ago

I keep saying that DeepScaleR is among the most impressive branches on the R1 tree, maybe the best one. Bytedance would do good to release a small Seed-Thinking for comparison though

thumb_up_off_alt47

chat_bubble_outline3

repeat6

shareShare

Sijun Tan

@sijun_tan

4 months ago

This is a great blog post to read. RL finally works—not because of major advances in algorithms, but because we now have strong pretrained models that provide a good prior. From there, we finetune the model to adapt to different kinds of environments. The upper bound of the

thumb_up_off_alt26

chat_bubble_outline0

repeat0

shareShare

Zain

@zainhasan6

4 months ago

one of these is not like the others!👀 DeepCoder is a 14b david amongst goliath coder models 👏kudos to Sijun Tan, Michael Luo and team!

one of these is not like the others!👀

DeepCoder is a 14b david amongst goliath coder models

👏kudos to <a href="/sijun_tan/">Sijun Tan</a>, <a href="/michaelzluo/">Michael Luo</a> and team!

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

Agentica Project

@agentica_

4 months ago

We're trending on Hugging Face models today! 🔥 Huge thanks to our amazing community for your support. 🙏

We're trending on <a href="/huggingface/">Hugging Face</a> models today! 🔥

Huge thanks to our amazing community for your support. 🙏

thumb_up_off_alt47

chat_bubble_outline3

repeat6

shareShare

Sijun Tan

@sijun_tan

3 months ago

If you’re at ICLR, swing by Session 4 today and check out our work JudgeBench! It’s a benchmark for evaluating LLM judges on their ability to distinguish between challenging reasoning responses. Not many reasoning papers made it to ICLR this year—o1 had just dropped around the

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Sijun Tan

@sijun_tan

2 months ago

Very interesting finding! RL can even work with incorrect rewards because the GRPO clipping term introduces a bias toward optimizing high-probability tokens, leading to a more concentrated distribution around them. This essentially means that if your model has learned a strong

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Yi Wu

@jxwuyi

2 months ago

We release fully async RL system AReaL-boba² for LLM & SOTA code RL w. Qwen3-14B! Qwen #opensource 🚀system&algorithm co-design → 2.77x faster ✅ 69.1 on LiveCodeBench 🔥 multi-turn RL ready 🔗 Project: github.com/inclusionAI/AR… 📄 Paper: arxiv.org/pdf/2505.24298 1/3👇

We release fully async RL system AReaL-boba² for LLM & SOTA code RL w. Qwen3-14B! <a href="/Alibaba_Qwen/">Qwen</a> #opensource
🚀system&algorithm co-design → 2.77x faster
✅ 69.1 on LiveCodeBench
🔥 multi-turn RL ready
🔗 Project: github.com/inclusionAI/AR…
📄 Paper: arxiv.org/pdf/2505.24298
1/3👇

thumb_up_off_alt148

chat_bubble_outline7

repeat36

shareShare

Sijun Tan

@sijun_tan

a month ago

The first half of 2025 is all about reasoning models. The second half? It’s about agents. At Agentica, we’re thrilled to launch two major releases: 1. DeepSWE, our STOA coding agent trained with RL that tops SWEBench leaderboard for open-weight models. 2. rLLM, our agent

thumb_up_off_alt56

chat_bubble_outline3

repeat9

shareShare

Jaskirat Singh

@1jaskiratsingh

a month ago

How much can we scale long-context multi-step agents using only RL? Short Answer: Quite a lot, given good training environments and scalable RL recipe. 🚨 We introduce DeepSWE-Preview, a reasoning-enabled coding agent trained from scratch from Qwen3-32B with only reinforcement

thumb_up_off_alt15

chat_bubble_outline1

repeat3

shareShare

Agentica Project

@agentica_

a month ago

It's easy to confuse Best@K vs Pass@K—and we've seen some misconceptions about our results. Our 59% on SWEBench-Verified is Pass@1 with Best@16, not Pass@8/16. Our Pass@8/16 is 67%/71%. So how did we achieve this? DeepSWE generates N candidate solutions. Then, another LLM

thumb_up_off_alt51

chat_bubble_outline1

repeat15

shareShare

Sijun Tan

@sijun_tan

a month ago

We've seen some misconceptions of our results. DeepSWE reports Best@K, not Pass@K, and they are very different! This post explains everything:

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxestex

a month ago

important correction on DeepSWE-Preview. on SWE-Bench-Verified: Pass@1 = 42.2% "Best@8" = 59%, trajectory selection achieved with a hybrid (execution-free + test-based, as in R2E-Gym paper) verifier. Ie the system itself can yield 59% w/o ground truth check. Actual pass@8 =67%.

thumb_up_off_alt37

chat_bubble_outline2

repeat4

shareShare