Christian Szegedy (@chrszegedy) 's Twitter Profile
Christian Szegedy

@chrszegedy

#deeplearning, #ai research scientist. Opinions are mine.

ID: 3237721718

calendar_today06-06-2015 09:14:20

7,7K Tweet

37,37K Followers

2,2K Following

Benjamin Todd (@ben_j_todd) 's Twitter Profile Photo

Why can AIs code for 1h but not 10h? A simple explanation: if there's a 10% chance of error per 10min step (say), the success rate is: 1h: 53% 4h: 8% 10h: 0.002% Toby Ord has tested this 'constant error rate' theory and shown it's a good fit for the data chance of

Why can AIs code for 1h but not 10h?

A simple explanation: if there's a 10% chance of error per 10min step (say), the success rate is:

1h: 53%
4h: 8%
10h: 0.002%

<a href="/tobyordoxford/">Toby Ord</a> has tested this 'constant error rate' theory and shown it's a good fit for the data

chance of
Google Research (@googleresearch) 's Twitter Profile Photo

Congratulations to the authors of "Going Deeper with Convolutions", recipient of the 2025 Longuet-Higgins Prize, which celebrates a paper released 10 years ago that has had a significant impact in the field of computer vision! arxiv.org/abs/1409.4842

Sergey Levine (@svlevine) 's Twitter Profile Photo

Self-supervised representation learning looks a bit like RL. What if we literally use RL as a SSL method for visual representations? Turns out that it works quite well. In new work by Dibya Ghosh, we show how this can be done: dibyaghosh.com/annotation_boo…

Alex Kontorovich (@alexkontorovich) 's Twitter Profile Photo

My lecture at the "Big Proof" Conference at the Newton Institute (Cambridge, UK -- this is where Mathlib was founded, in 2017!) is posted: youtube.com/live/cxgcgceRL… The lecture was on a vision for mathematics, interaction with LLMs/AI, and formalization.

Greg Kamradt (@gregkamradt) 's Twitter Profile Photo

We got a call from xAI 24 hours ago “We want to test Grok 4 on ARC-AGI” We heard the rumors. We knew it would be good. We didn’t know it would become the #1 public model on ARC-AGI Here’s the testing story and what the results mean: Yesterday, we chatted with Jimmy from the

Chubby♨️ (@kimmonismus) 's Twitter Profile Photo

Grok 4 – the facts - Musk wasn't lying, Grok 4 is the best AI model out there - Grok 4 is smarter than all graduate students out there - 100x more training than Grok 2, 10x more RL than any other company for their model - “Post-graduate level at everything” - 44.4% in Humanitys

Grok 4 – the facts

- Musk wasn't lying, Grok 4 is the best AI model out there
- Grok 4 is smarter than all graduate students out there
- 100x more training than Grok 2, 10x more RL than any other company for their model
- “Post-graduate level at everything”
- 44.4% in Humanitys
Christian Szegedy (@chrszegedy) 's Twitter Profile Photo

Tried Grok 4 on a dozen non-trivial math (under/)grad level math problems. So far, it has failed to fail me even once. Congrats to Yuhuai (Tony) Wu, Eric Zelikman and the whole xAI reasoning team, their progress has exceeded all my expectation!

Noam Y (@noam_yy) 's Twitter Profile Photo

Par, an expressive, concurrent, total* language with linear types and full duality. Based on Linear Logic and Session Types, Par has both functional and imperative features integrating seamlessly

ℏεsam (@hesamation) 's Twitter Profile Photo

a guy created a dataset of 50 books from London 1800-1850 for LLM training. no modern bias. it’s actually super cool to see what can be trained on it!

a guy created a dataset of 50 books from London 1800-1850 for LLM training.  no modern bias. it’s actually super cool to see what can be trained on it!
Miles Turpin (@milesaturpin) 's Twitter Profile Photo

New @Scale_AI paper! 🌟 LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).

New @Scale_AI paper! 🌟

LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).
Richard Suwandi @ICLR2025 (@richardcsuwandi) 's Twitter Profile Photo

BatchNorm wins the Test-of-Time Award at #ICML2025! 🎉 BatchNorm revolutionized deep learning by addressing internal covariate shift, which can slow down learning, limits learning rates, and makes it difficult to train deep networks. By normalizing inputs within each

BatchNorm wins the Test-of-Time Award at #ICML2025! 🎉

BatchNorm revolutionized deep learning by addressing internal covariate shift, which can slow down learning, limits learning rates, and makes it difficult to train deep networks.

By normalizing inputs within each