Jaime Sevilla (@jsevillamol) 's Twitter Profile
Jaime Sevilla

@jsevillamol

Director of @EpochAIResearch. Trying to glimpse the future of AI.

ID: 3216158848

linkhttps://jaimesevilla.me/ calendar_today28-04-2015 07:25:38

8,8K Tweet

3,3K Followers

412 Following

Luke Frymire (@lukefrymire) 's Twitter Profile Photo

Shameless moment of appreciation for Epoch: we put out four research pieces today alone, and we’ve been averaging about one every other day lately. In my previous job, I worked on the same research for 1.5 years, and it is still not published a another 1.5 later.

Jaime Sevilla (@jsevillamol) 's Twitter Profile Photo

I believe we are on track towards AI that can do some really impressive things in the next 5 years, no acceleration needed. Solving open math problems, programming complex features on request, doing PA work reliably and generating long form videos all look within reach of the

Carlos Fenollosa (@cfenollosa) 's Twitter Profile Photo

Esta entrevista con Jaime Sevilla me ha parecido súper interesante. Necesitamos más voces pragmáticas, alejadas tanto del extremo catastrofista como los del «no es para tanto» elmundo.es/papel/lideres/…

Peter Wildeford 🇺🇸🚀 (@peterwildeford) 's Twitter Profile Photo

Latest DeepSeek 4-11 months behind US: * ~5 months behind US SOTA on GPQA Diamond * ~4 months behind on MATH lvl 5 * ~11 months behind on SWE-Bench-Verified We need more good evals to benchmark the US-China gap. Kudos to Epoch AI for doing some of this work.

Jaime Sevilla (@jsevillamol) 's Twitter Profile Photo

Whether compute and labour in AI research behave more like complements or substitutes is one of the most important questions of our time. Glad to see more work into it!

Jiaxin Wen @ICLR2025 (@jiaxinwen22) 's Twitter Profile Photo

We then compare our system against 25 AI PhDs or postdocs with an average h-index of 8.9. Our participants take this task seriously: they spend on average 9.1 minutes on evaluating each pair of ideas. On five popular NLP topics, our system beats the majority voting of human

We then compare our system against 25 AI PhDs or postdocs with an average h-index of 8.9. Our participants take this task seriously: they spend on average 9.1 minutes on evaluating each pair of ideas.

On five popular NLP topics, our system beats the majority voting of human