Álvaro Barbero Jiménez (@albarjip) 's Twitter Profile
Álvaro Barbero Jiménez

@albarjip

Chief Data Scientist at @IIConocimiento, crazy scientist, artificial artist. Machine Learning, AI, optimization, programming, 日本文化, geek stuff

ID: 749039234

linkhttps://albarji.substack.com/ calendar_today10-08-2012 09:43:08

3,3K Tweet

913 Followers

673 Following

John Carmack (@id_aa_carmack) 's Twitter Profile Photo

I have never seen it expressed exactly like that, but I wholeheartedly endorse it: Feedback beats planning. My plea at Meta was “No grand plans, follow the gradient of user value”.

ARC Prize (@arcprize) 's Twitter Profile Photo

Clarifying o3’s ARC-AGI Performance OpenAI has confirmed: * The released o3 is a different model from what we tested in December 2024 * All released o3 compute tiers are smaller than the version we tested * The released o3 was not trained on ARC-AGI data, not even the train

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

There's a new paper circulating looking in detail at LMArena leaderboard: "The Leaderboard Illusion" arxiv.org/abs/2504.20879 I first became a bit suspicious when at one point a while back, a Gemini model scored #1 way above the second best, but when I tried to switch for a few

David Rozado (@davidrozado) 's Twitter Profile Photo

1/ Do AI systems discriminate based on gender when choosing the most qualified candidate for a job? I ran an experiment with several leading LLMs to find out. Here's what I discovered:👇

1/ Do AI systems discriminate based on gender when choosing the most qualified candidate for a job? I ran an experiment with several leading LLMs to find out. Here's what I discovered:👇
Carlos Santana (@dotcsv) 's Twitter Profile Photo

Si realmente viviéramos en una simulación y yo fuera el guionista, sería con este tipo de vídeos con el que empezaría a dar pistas que precipiten los acontecimientos de final de temporada para acabar revelando que, efectivamente, vivimos en una simulación.

ARC Prize (@arcprize) 's Twitter Profile Photo

Claude Sonnet 4 on ARC-AGI Semi Private Eval Base * ARC-AGI-1: 23%, $0.08/task * ARC-AGI-2: 1.2%, $0.12/task Thinking 16K * ARC-AGI-1: 40%, $0.36/task * ARC-AGI-2: 5.9%, $0.48/task Sonnet 4 sets new SOTA (5.9%) on ARC-AGI-2

Claude Sonnet 4 on ARC-AGI Semi Private Eval

Base
* ARC-AGI-1: 23%, $0.08/task
* ARC-AGI-2: 1.2%, $0.12/task

Thinking 16K
* ARC-AGI-1: 40%, $0.36/task
* ARC-AGI-2: 5.9%, $0.48/task

Sonnet 4 sets new SOTA (5.9%) on ARC-AGI-2
Álvaro Barbero Jiménez (@albarjip) 's Twitter Profile Photo

A very necessary paper, showing once again that LLMs, even "reasoning" LLMs, do not actually reason but collapse when far from their training distribution. AGI or PhD-level AIs are not around the corner. But they are still extremely useful when used correctly.

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

It’s a hefty 206-page research paper, and the findings are concerning. "LLM users consistently underperformed at neural, linguistic, and behavioral levels" This study finds LLM dependence weakens the writer’s own neural and linguistic fingerprints. 🤔🤔 Relying only on EEG,

It’s a hefty 206-page research paper, and the findings are concerning.

"LLM users consistently underperformed at neural, linguistic, and behavioral levels"

This study finds LLM dependence weakens the writer’s own neural and linguistic fingerprints. 🤔🤔

Relying only on EEG,
Escaños en Blanco para dejar Escaños Vacíos (@escanosenblanco) 's Twitter Profile Photo

Desde la semana pasada, esta cuenta está creciendo de forma orgánica. Es decir: crece sin responder a una publicación o a una acción concreta. La herramienta que ofrece Escaños en Blanco, la idea de dejar escaños vacíos, está empezando a moverse sola. Ciudadanía. Sin partidos.

UAM Autónoma Madrid (@uam_madrid) 's Twitter Profile Photo

🏆 Los Premios a jóvenes investigadores e investigadoras UAM 2024 reconocen la contribución significativa al desarrollo de la actividad investigadora en la #UAM. ¡Enhorabuena a todas las personas galardonadas en esta edición!👏🏻 Alfonso Santos López, Anne-Marie Reynaers, Fátima

🏆 Los Premios a jóvenes investigadores e investigadoras UAM 2024 reconocen la contribución significativa al desarrollo de la actividad investigadora en la #UAM.

¡Enhorabuena a todas las personas galardonadas en esta edición!👏🏻

Alfonso Santos López, Anne-Marie Reynaers, Fátima
PyTorch (@pytorch) 's Twitter Profile Photo

Discover how #verl simplifies #ReinforcementLearning for advanced #LLM reasoning and tool use in our Aug 6 Expert Exchange with Haibin Lin (ByteDance). Supports PPO/GRPO/DAPO, async rollout, expert parallelism for MoE, and more. #PyTorch #OpenSourceAI 🔗 hubs.la/Q03xkQW-0

Discover how #verl simplifies #ReinforcementLearning for advanced #LLM reasoning and tool use in our Aug 6 Expert Exchange with Haibin Lin (ByteDance). Supports PPO/GRPO/DAPO, async rollout, expert parallelism for MoE, and more. #PyTorch #OpenSourceAI
🔗 hubs.la/Q03xkQW-0
ARC Prize (@arcprize) 's Twitter Profile Photo

Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%

Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI

We’re releasing:
* 3 games (environments)
* $10K agent contest
* AI agents API

Starting scores - Frontier AI: 0%, Humans: 100%