Dhruv Singh (@ds3638) Twitter Tweets • TwiCopy

An Vo

3 months ago

🚨 Our latest work shows that SOTA VLMs (o3, o4-mini, Sonnet, Gemini Pro) fail at counting legs due to bias⁉️ See simple cases where VLMs get it wrong, no matter how you prompt them. 🧪 Think your VLM can do better? Try it yourself here: vlmsarebiased.github.io/#example-galle… 1/n #ICML2025

thumb_up_off_alt293

chat_bubble_outline7

repeat41

shareShare

Ethan

@torchcompiled

2 months ago

3D polygon diffusion is one of the coolest things I've ever seen

thumb_up_off_alt4,4K

chat_bubble_outline55

repeat427

shareShare

METR

@metr_evals

2 months ago

We found that Grok 4’s 50%-time-horizon on our agentic multi-step software engineering tasks is about 1hr 50min (with a 95% CI of 48min to 3hr 52min) compared to o3 (previous SOTA) at about 1hr 30min. However, Grok 4’s time horizon is below SOTA at higher success rate thresholds.

thumb_up_off_alt323

chat_bubble_outline9

repeat22

shareShare

steve hsu

@hsu_steve

2 months ago

You can't go back in time... But perhaps look at this book. It's accessible to non-math people but also has insights for VERY mathy people who don't understand real physics. There's even a page that explains "tensors" - might be useful for would-be AI engineers ;-) If

thumb_up_off_alt215

chat_bubble_outline10

repeat24

shareShare

Teknium (e/λ)

@teknium1

2 months ago

I really love synthetic data arxiv.org/abs/2507.23751

thumb_up_off_alt409

chat_bubble_outline9

repeat42

shareShare

will brown

@willccbb

2 months ago

thumb_up_off_alt643

chat_bubble_outline19

repeat47

shareShare

ℏεsam

@hesamation

2 months ago

Manus shared an article on how they manage the context of their agents. It actually includes some pretty practical point. If you’re building or using agents in any sort or form, this can be quite useful.

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat232

shareShare

Sam Paech

@sam_paech

2 months ago

Drew Breunig No, so this will have to do.

<a href="/dbreunig/">Drew Breunig</a> No, so this will have to do.

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

swyx

@swyx

2 months ago

current multiples are interesting. OAI: 15x EOY25 ARR Ant: 19x EOY25 ARR details OAI 300b valuation, 13b ARR now, 20b EOY Ant 170b valuation, 5b ARR now, 9b EOY these don't seem... crazy?

thumb_up_off_alt363

chat_bubble_outline35

repeat8

shareShare

Venelin K.

@venelinkochev

2 months ago

jack friks check ClaudeLog, lots of useful tips claudelog.com

thumb_up_off_alt50

chat_bubble_outline0

repeat3

shareShare

Simon Willison

@simonw

2 months ago

Firas D Anthropic OpenAI Yeah I did a breakdown of that a while ago, it's fascinating how that works: simonwillison.net/2025/May/25/cl…

thumb_up_off_alt21

chat_bubble_outline1

repeat2

shareShare

will brown

@willccbb

2 months ago

i'm increasingly convinced that "transformative ai" is going to look like an abundance of specialized models for everything from drug design to weather sims to robotics to supply chains, not one agent to rule them all. we're going to need a lot more ai researchers

thumb_up_off_alt1,1K

chat_bubble_outline105

repeat101

shareShare

Susan Zhang

@suchenzang

a month ago

[episode 120 of frontier lab gossip: OH in the mission] > **** folks recently bragged about a 100k H100 training run > wrote a post, got all the likes and shares internally > then some of them ran the exact same job on 20k H100s instead of 100k, and ended up with the exact same

thumb_up_off_alt1,1K

chat_bubble_outline33

repeat56

shareShare

@levelsio

a month ago

People are so confused, it's not about fulfillment No sane rich person spends all their money It's about taking out 3% (Safe Withdrawal Rate) * $5M / 12 months of your investment = receive $12,500/mo forever This gives you the statistical guarantee you'll have enough money to

thumb_up_off_alt4,4K

chat_bubble_outline148

repeat159

shareShare

Emmanuel Ameisen

@mlpowered

a month ago

Friday we published our no-jargon explainer of how LLMs "think". It's been great to see the response - already at 80k views! We covered why models hallucinate, why they flatter users, and if they are just glorified autocomplete! If you are curious about LLMs, check it out! 👇

thumb_up_off_alt277

chat_bubble_outline10

repeat14

shareShare

Richard Sutton

@richardssutton

a month ago

I was happy to give a more technical talk on how we might create an AI at RLC-2025 and AGI-2025 (video below). The Oak Architecture: A Vision of Super-Intelligence from Experience As AI has become a huge industry, to an extent it has lost its way. What is needed to get us back on

thumb_up_off_alt644

chat_bubble_outline14

repeat95

shareShare

Ara

@arafatkatze

a month ago

In building AI agents Cline , we've identified three mind viruses Mind Viruses are seductive ideas that sound smart, but don’t work in practice. 1. Multi-Agent Orchestration 2. RAG (Retrieval Augmented Generation) 3. More Instructions = Better Results Let's explore why!

In building AI agents <a href="/cline/">Cline</a> , we've identified three mind viruses Mind Viruses are seductive ideas that sound smart, but don’t work in practice.
1. Multi-Agent Orchestration
2. RAG (Retrieval Augmented Generation)
3. More Instructions = Better Results
Let's explore why!

thumb_up_off_alt2,2K

chat_bubble_outline97

repeat158

shareShare

Matt Pocock

@mattpocockuk

a month ago

This is actually a really solid context engineering template. Kudos, Anthropic

This is actually a really solid context engineering template.

Kudos, <a href="/AnthropicAI/">Anthropic</a>

thumb_up_off_alt3,3K

chat_bubble_outline34

repeat295

shareShare

Jeremy Howard

@jeremyphoward

a month ago

IIUC, someone just got entropix to work and published it!...

thumb_up_off_alt545

chat_bubble_outline13

repeat31

shareShare

tensorqt

@tensorqt

a month ago

attention sinks may be a bias in causal transformers. as some of you know, i've been writing a long blogpost on attention and its properties as a message-passing operation on graphs. while doing so, i figured i might have found an explanation for which attention sinks may be an

thumb_up_off_alt824

chat_bubble_outline33

repeat83

shareShare