Jarrod Barnes (@jarrodbarnes) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework. Agents must replicate top ICML 2024 papers, including understanding the paper, writing code, and executing experiments.

thumb_up_off_alt7,7K

chat_bubble_outline221

repeat1,1K

shareShare

Guillermo Rauch

@rauchg

3 months ago

Will we hire engineers in the future? Prompting is thinking. The sharper your thoughts, the better the prompts, the better the outputs. Engineering and design are headed into a world where most of the output is produced by prompting, if not all. The way we interview engineers

thumb_up_off_alt3,3K

chat_bubble_outline191

repeat297

shareShare

Cognition

@cognition_labs

3 months ago

Our research interns present: Kevin-32B = K(ernel D)evin It's the first open model trained using RL for writing CUDA kernels. We implemented multi-turn RL using GRPO (based on QwQ-32B) on the KernelBench dataset. It outperforms top reasoning models (o3 & o4-mini)! 🧵

thumb_up_off_alt1,1K

chat_bubble_outline45

repeat190

shareShare

Haider.

@slow_developer

3 months ago

Anthropic CPO, Mike Krieger: "over 70% of Anthropic pull requests are now generated by AI" but we're still figuring out what that means for code review and long-term architecture.

thumb_up_off_alt606

chat_bubble_outline27

repeat60

shareShare

Visual Studio Code

@code

2 months ago

Today, we're announcing plans to make VS Code an open source AI editor. We believe AI development should stay true to VS Code's core principles: open, collaborative, and community-driven. Let's build the future of software development together. aka.ms/open-source-ai…

thumb_up_off_alt14,14K

chat_bubble_outline394

repeat2,2K

shareShare

Mistral AI

@mistralai

2 months ago

Meet Devstral, our SOTA open model designed specifically for coding agents and developed with All Hands AI mistral.ai/news/devstral

Meet Devstral, our SOTA open model designed specifically for coding agents and developed with <a href="/allhands_ai/">All Hands AI</a>

mistral.ai/news/devstral

thumb_up_off_alt3,3K

chat_bubble_outline102

repeat431

shareShare

NVIDIA AI Developer

@nvidiaaidev

2 months ago

📣 Introducing AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning (RL) Starting from the SFT model DeepSeek-R1-Distill-Qwen-14B, our AceReason-Nemotron-14B achieves substantial improvements in pass@1 accuracy on key benchmarks through RL: AIME

thumb_up_off_alt141

chat_bubble_outline7

repeat36

shareShare

Boris Wertz

@bwertz

2 months ago

Moats in the age of AI: speed!

thumb_up_off_alt735

chat_bubble_outline27

repeat82

shareShare

Naval

@naval

2 months ago

Acquiring knowledge is easy, the hard part is knowing what to apply and when. That’s why all true learning is “on the job.” Life is lived in the arena.

thumb_up_off_alt24,24K

chat_bubble_outline529

repeat3,3K

shareShare

Ryan Marten

@ryanmart3n

2 months ago

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

thumb_up_off_alt880

chat_bubble_outline27

repeat181

shareShare

David Singleton

@dps

a month ago

Coding agents have crossed a chasm Somewhere in the last few months, something fundamental shifted for me with autonomous AI coding agents. They’ve gone from a “hey this is pretty neat” curiosity to something I genuinely can’t imagine working without. Not in a hand-wavy,

thumb_up_off_alt509

chat_bubble_outline20

repeat42

shareShare

VV

@visualizevalue

a month ago

Learning Curve

thumb_up_off_alt546

chat_bubble_outline20

repeat59

shareShare

Logan Kilpatrick

@officiallogank

a month ago

Say hello to the gemini-cli, a local CLI to help you build and maintain software with 1,000 free Gemini 2.5 Pro requests per day : )

Say hello to the <a href="/geminicli/">gemini-cli</a>, a local CLI to help you build and maintain software with 1,000 free Gemini 2.5 Pro requests per day : )

thumb_up_off_alt5,5K

chat_bubble_outline348

repeat429

shareShare

NovaSky

@novaskyai

a month ago

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇 Blog: novasky-ai.notion.site/skyrl-v01 Code: github.com/NovaSky-AI/Sky…

thumb_up_off_alt202

chat_bubble_outline2

repeat43

shareShare

Hokin Deng

@denghokin

a month ago

#ICML #cognition #GrowAI We spent 2 years carefully curated every single experiment (i.e. object permanence, A-not-B task, visual cliff task) in this dataset (total: 1503 classic experiments spanning 12 core cognitive concepts). We spent another year to get 230 MLLMs evaluated

thumb_up_off_alt524

chat_bubble_outline11

repeat72

shareShare

Minqi Jiang

@minqijiang

a month ago

Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total

thumb_up_off_alt1,1K

chat_bubble_outline36

repeat181

shareShare

Jason Weston

@jaseweston

a month ago

🌉 Bridging Offline & Online RL for LLMs 🌉 📝: arxiv.org/abs/2506.21495 New paper shows on verifiable & non-verifiable tasks: - Online DPO & GRPO give similar performance. - Semi-online (iterative) DPO with sync every s steps (more efficient!) works very well also. - Offline DPO

thumb_up_off_alt446

chat_bubble_outline1

repeat96

shareShare

Sakana AI

@sakanaailabs

a month ago

We’re excited to introduce AB-MCTS! Our new inference-time scaling algorithm enables collective intelligence for AI by allowing multiple frontier models (like Gemini 2.5 Pro, o4-mini, DeepSeek-R1-0528) to cooperate. Blog: sakana.ai/ab-mcts Paper: arxiv.org/abs/2503.04412

thumb_up_off_alt1,1K

chat_bubble_outline27

repeat222

shareShare

Liliang Ren

@liliang_ren

11 days ago

We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat216

shareShare

Loubna Ben Allal

@loubnabenallal1

5 hours ago

SmolLM3 full training and evaluation code is now live, along with 100+ intermediate checkpoints: ✓ Pretraining scripts (nanotron) ✓ Post-training code SFT + APO (TRL/alignment-handbook) ✓ Evaluation scripts to reproduce all reported metrics github.com/huggingface/sm… All

thumb_up_off_alt272

chat_bubble_outline6

repeat41

shareShare

Jarrod Barnes

Gate.io

OpenAI

Guillermo Rauch

Cognition

Haider.

Visual Studio Code

Mistral AI

NVIDIA AI Developer

Boris Wertz

Naval

Ryan Marten

David Singleton

VV

Logan Kilpatrick

NovaSky

Hokin Deng

Minqi Jiang

Jason Weston

Sakana AI

Liliang Ren

Loubna Ben Allal