Jaime Sevilla (@jsevillamol) Twitter Tweets • TwiCopy

Shameless moment of appreciation for Epoch: we put out four research pieces today alone, and we’ve been averaging about one every other day lately. In my previous job, I worked on the same research for 1.5 years, and it is still not published a another 1.5 later.

thumb_up_off_alt77

chat_bubble_outline1

repeat7

shareShare

Jaime Sevilla

@jsevillamol

2 months ago

I believe we are on track towards AI that can do some really impressive things in the next 5 years, no acceleration needed. Solving open math problems, programming complex features on request, doing PA work reliably and generating long form videos all look within reach of the

thumb_up_off_alt51

chat_bubble_outline1

repeat3

shareShare

Carlos Fenollosa

@cfenollosa

2 months ago

Esta entrevista con Jaime Sevilla me ha parecido súper interesante. Necesitamos más voces pragmáticas, alejadas tanto del extremo catastrofista como los del «no es para tanto» elmundo.es/papel/lideres/…

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare

Peter Wildeford 🇺🇸🚀

@peterwildeford

2 months ago

Latest DeepSeek 4-11 months behind US: * ~5 months behind US SOTA on GPQA Diamond * ~4 months behind on MATH lvl 5 * ~11 months behind on SWE-Bench-Verified We need more good evals to benchmark the US-China gap. Kudos to Epoch AI for doing some of this work.

thumb_up_off_alt108

chat_bubble_outline3

repeat15

shareShare

Jaime Sevilla

@jsevillamol

2 months ago

Whether compute and labour in AI research behave more like complements or substitutes is one of the most important questions of our time. Glad to see more work into it!

thumb_up_off_alt46

chat_bubble_outline0

repeat4

shareShare

Jaime Sevilla

@jsevillamol

2 months ago

Anthropic models are pretty good at coding, according to benchmarks.

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Jiaxin Wen @ICLR2025

@jiaxinwen22

2 months ago

We then compare our system against 25 AI PhDs or postdocs with an average h-index of 8.9. Our participants take this task seriously: they spend on average 9.1 minutes on evaluating each pair of ideas. On five popular NLP topics, our system beats the majority voting of human

thumb_up_off_alt18

chat_bubble_outline1

repeat1

shareShare

Jaime Sevilla

Gate.io

Jaime Sevilla

Jaime Sevilla

david rein

Jaime Sevilla

Jaime Sevilla

Luke Frymire

Jaime Sevilla

Carlos Fenollosa

Peter Wildeford 🇺🇸🚀

Jaime Sevilla

Jaime Sevilla

Jiaxin Wen @ICLR2025