Ben Athiwaratkun @ ICLR (@ben_athi) 's Twitter Profile
Ben Athiwaratkun @ ICLR

@ben_athi

Leading Turbo Team @ Together AI. prev: @awscloud @MSFTResearch, @Cornell PhD.

ID: 2613894511

linkhttp://benathi.github.io calendar_today09-07-2014 16:57:43

360 Tweet

820 Followers

694 Following

Linda He (@lindahe49140661) 's Twitter Profile Photo

Excited to share our work on scaling LLMs to handle million-token contexts! Training models for ultra-long sequences is challenging due to data scarcity. We introduce a novel hierarchical synthetic data generation pipeline to overcome this. Thrilled this will be presented at ICLR

Excited to share our work on scaling LLMs to handle million-token contexts! Training models for ultra-long sequences is challenging due to data scarcity. We introduce a novel hierarchical synthetic data generation pipeline to overcome this. Thrilled this will be presented at ICLR
Junlin Wang (@junlinwang3) 's Twitter Profile Photo

Excited to share work from my Together AI internship—a deep dive into inference‑time scaling methods 🧠 We rigorously evaluated verifier‑free inference-time scaling methods across both reasoning and non‑reasoning LLMs. Some key findings: 🔑 Even with huge rollout budgets,

Excited to share work from my <a href="/togethercompute/">Together AI</a> internship—a deep dive into inference‑time scaling methods 🧠

We rigorously evaluated verifier‑free inference-time scaling methods across both reasoning and non‑reasoning LLMs. Some key findings:

🔑 Even with huge rollout budgets,
Ben Athiwaratkun @ ICLR (@ben_athi) 's Twitter Profile Photo

If you're at ICLR and passionate about optimizing language models for speed and efficiency, swing by the Together AI booth for a chat.

If you're at ICLR and passionate about optimizing language models for speed and efficiency, swing by the Together AI booth for a chat.
Together AI (@togethercompute) 's Twitter Profile Photo

🔔 New blog post on how we can attain large speedups for our inference customers using custom speculators! 🚀 Key benefits of customization: ✅ ~1.3x faster inference ✅ ~25% cost reduction ✅ Gets better as you generate more responses

🔔 New blog post on how we can attain large speedups for our inference customers using custom speculators! 🚀

Key benefits of customization:
✅ ~1.3x faster inference
✅ ~25% cost reduction
✅ Gets better as you generate more responses
Together AI (@togethercompute) 's Twitter Profile Photo

🚀 New research: YAQA — Yet Another Quantization Algorithm (yes, pronounced like yaca/jackfruit 🥭) Led by Albert Tseng, YAQA minimizes the KL divergence to the original model during quantization, cutting it by >30% vs. prior methods and outperforming even QAT on Gemma 3. 👇

🚀 New research: YAQA — Yet Another Quantization Algorithm (yes, pronounced like yaca/jackfruit 🥭)

Led by <a href="/tsengalb99/">Albert Tseng</a>, YAQA minimizes the KL divergence to the original model during quantization, cutting it by &gt;30% vs. prior methods and outperforming even QAT on Gemma 3.

👇
Vipul Ved Prakash (@vipulved) 's Twitter Profile Photo

.Together AI is building 2 gigawatts of AI factories (~100,000 GPUs) in the EU over the next 4 years with the first phase live in H2 '2025. AI compute is at <1% saturation relative to our 2035 forecast and we are starting early to build a large-scale sustainable AI cloud

Hassan (@nutlope) 's Twitter Profile Photo

Our open deep research app is launching in 24 hours! Generate reports about any topic using OSS LLMs. 100% free & open source.

Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 
🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Together AI (@togethercompute) 's Twitter Profile Photo

🔓⚡ FLUX.1 Kontext [dev] just landed on Together AI First open-weight model w/ proprietary-level image editing: 🎨 Perfect character consistency 🏆 Beats Gemini Flash + competitors 🛠️ Full model weights for customization Enterprise-quality editing, open weights💎

🔓⚡ FLUX.1 Kontext [dev] just landed on Together AI

First open-weight model w/ proprietary-level image editing:

🎨 Perfect character consistency
🏆 Beats Gemini Flash + competitors
🛠️ Full model weights for customization

Enterprise-quality editing, open weights💎
Together AI (@togethercompute) 's Twitter Profile Photo

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.

Built in
Ben Athiwaratkun @ ICLR (@ben_athi) 's Twitter Profile Photo

Come check out our poster on speeding up LLM, happening now til 1.30 pm. TL;DR — we show that we can hide the latency of all reduce operations in tensor parallel setting by modifying residual architecture to overlap MLP and attention.

Ben Athiwaratkun @ ICLR (@ben_athi) 's Twitter Profile Photo

TL;DR - one way to push the quality-efficiency frontier: obtain high quality generations via a collection of LLMs -> distill to a smaller model -> get a higher quality small model that is more inference-efficient than the original collection of models. Poster session

Together AI (@togethercompute) 's Twitter Profile Photo

🤖OpenAI's open models are here. gpt-oss models just landed on Together AI. Achieves near-parity with o4- mini, trained using o3 techniques. Build anything, deploy anywhere🔥

Ben Athiwaratkun @ ICLR (@ben_athi) 's Twitter Profile Photo

Most speculative decoding research focuses on algorithms. But we know that data matters a ton! (e.g. no matter how good the spec algorithm is, if it's trained on bad & misaligned data, the speed will be poor) What if we build on algorithms that make data really shine?! In