andrea panizza (@unsorsodicorda) Twitter Tweets • TwiCopy

Nathan Lambert

2 months ago

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat193

shareShare

wh

@nrehiew_

2 months ago

New post! This time, about the current state of Long Context Evaluation. I discuss existing benchmarks, what makes a good long context eval, what's missing from existing ones and introduce a new one - LongCodeEdit :)

thumb_up_off_alt504

chat_bubble_outline13

repeat43

shareShare

Eugene Kim

@eugenekim222

2 months ago

New: Internal Amazon documents warn AI startups are delaying and diversifying their cloud spending. There are also internal concerns about AWS’ pricing and lagging reputation in AI. businessinsider.com/amazon-ai-star…

thumb_up_off_alt72

chat_bubble_outline5

repeat15

shareShare

Aayush Karan

@aakaran31

2 months ago

We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.

thumb_up_off_alt1,1K

chat_bubble_outline48

repeat152

shareShare

Epoch AI

@epochairesearch

2 months ago

We evaluated Claude Haiku 4.5 on several benchmarks. Even with reasoning disabled, Haiku 4.5 performs similarly or better than early lightweight reasoning models, like o1-mini.

thumb_up_off_alt170

chat_bubble_outline6

repeat11

shareShare

Jay A

@jay_azhang

2 months ago

Alpha Arena is LIVE 6 AI models trading $10K each, fully autonomously Real money. Real markets. Real benchmark. Who's your money on? Link below

thumb_up_off_alt3,3K

chat_bubble_outline331

repeat340

shareShare

andrea panizza

@unsorsodicorda

2 months ago

Hi! Does anyone know of a simple but usable (i.e., not O(N^2)) implementation of PointNet++ in Pytorch? This is what I mean by "simple" github.com/thuml/Neural-S… but alas, this is PointNet, not PointNet++

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Andrej Karpathy

@karpathy

2 months ago

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my

thumb_up_off_alt10,10K

chat_bubble_outline405

repeat1,1K

shareShare

Xeophon

@thexeophon

2 months ago

Shocking: MoM growth slows down as you penetrate the market More context: 15% of 800M is 120M. Working population: 46M (GER) + 24M (ESP) + 38M (ITA) + 30M (FRA) = 138M (total: 255M); so they captured the maj of the available market already

thumb_up_off_alt21

chat_bubble_outline3

repeat2

shareShare

Xeophon

@thexeophon

2 months ago

new whale dropped

thumb_up_off_alt1,1K

chat_bubble_outline30

repeat96

shareShare

Alexander Doria

@dorialexander

2 months ago

So longer read of DeepSeek-OCR It’s an engineering achievement. It has been suspected for a while that VLM/OCR models could be significantly smaller. The pre-VLM state of the art, Google Cloud OCR would not be more than a 100m model. More recently, relatively small open weights

thumb_up_off_alt1,1K

chat_bubble_outline27

repeat89

shareShare

Logan Kilpatrick

@officiallogank

2 months ago

Tomorrow is a special day for the AI Studio team. Since May, we have been heads down building a brand new AI vibe coding experience to accelerate the path from prompt to production with Gemini. Can’t wait to show you all :)

thumb_up_off_alt5,5K

chat_bubble_outline350

repeat313

shareShare

Qwen

@alibaba_qwen

2 months ago

Introducing Qwen3-VL-2B and Qwen3-VL-32B! From edge to cloud, these dense powerhouses deliver ultimate performance per GPU memory, packing the full capabilities of Qwen3-VL into compact and scalable forms. 🔥 Qwen3-VL-32B outperforms GPT-5 mini & Claude 4 Sonnet across STEM,

thumb_up_off_alt1,1K

chat_bubble_outline72

repeat252

shareShare

Noam Brown

@polynoamial

2 months ago

Below is a deep dive into why self play works for two-player zero-sum (2p0s) games like Go/Poker/Starcraft but is so much harder to use in "real world" domains. tl;dr: self play converges to minimax in 2p0s games, and minimax is really useful in those games. Every finite 2p0s

thumb_up_off_alt1,1K

chat_bubble_outline57

repeat166

shareShare

Jessy Lin

@realjessylin

2 months ago

As part of our recent work on memory layer architectures, I wrote up some of my thoughts on the continual learning problem broadly: Blog post: jessylin.com/2025/10/20/con… Some of the exposition goes beyond mem layers, so I thought it'd be useful to highlight separately:

thumb_up_off_alt986

chat_bubble_outline24

repeat141

shareShare

Ernest Ryu

@ernestryu

2 months ago

I used ChatGPT to solve an open problem in convex optimization. *Part I* (1/N)

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat168

shareShare

Hunyuan

@tencenthunyuan

2 months ago

Today, we are open-sourcing Hunyuan World 1.1 (WorldMirror), a universal feed-forward 3D reconstruction model. 🚀🚀🚀 While our previously released Hunyuan World 1.0 (open-sourced, lite version deployable on consumer GPUs) focused on generating 3D worlds from text or

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat253

shareShare