Michael Hodel (@bayesilicon) 's Twitter Profile
Michael Hodel

@bayesilicon

writer (of programs) | AI researcher @tufalabs

ID: 1460008017521434628

linkhttp://tufalabs.ai calendar_today14-11-2021 22:13:39

97 Tweet

927 Followers

651 Following

Wenhao Li (@wenhaoli29) 's Twitter Profile Photo

We trained a Vision Transformer to solve ONE single task from François Chollet and Mike Knoop’s ARC Prize. Unexpectedly, it failed to produce the test output, even when using 1 MILLION examples! Why is this the case? 🤔

We trained a Vision Transformer to solve ONE single task from <a href="/fchollet/">François Chollet</a> and <a href="/mikeknoop/">Mike Knoop</a>’s <a href="/arcprize/">ARC Prize</a>. Unexpectedly, it failed to produce the test output, even when using 1 MILLION examples! Why is this the case? 🤔
Kevin Ellis (@ellisk_kellis) 's Twitter Profile Photo

New ARC-AGI paper ARC Prize w/ fantastic collaborators Wen-Ding Li @ ICLR'25 Keya Hu Zenna Tavares evanthebouncy Basis For few-shot learning: better to construct a symbolic hypothesis/program, or have a neural net do it all, ala in-context learning? cs.cornell.edu/~ellisk/docume…

New ARC-AGI paper 
 <a href="/arcprize/">ARC Prize</a>  w/ fantastic collaborators <a href="/xu3kev/">Wen-Ding Li @ ICLR'25</a>  <a href="/HuLillian39250/">Keya Hu</a>  <a href="/ZennaTavares/">Zenna Tavares</a>  <a href="/evanthebouncy/">evanthebouncy</a> <a href="/BasisOrg/">Basis</a> 
For few-shot learning: better to construct a symbolic hypothesis/program, or have a neural net do it all, ala in-context learning?
cs.cornell.edu/~ellisk/docume…
Mohamed Osman (@mohamedosmanml) 's Twitter Profile Photo

We got upto 55.5% on the ARC Prize leaderboard today! Progress towards the 60.2 % milestone of median human performance reported by arxiv.org/pdf/2409.01374 is not slowing down. Jack Cole Michael Hodel

We got upto 55.5% on the <a href="/arcprize/">ARC Prize</a> leaderboard today! 
Progress towards the 60.2 % milestone of median human performance reported by arxiv.org/pdf/2409.01374 is not slowing down. 
<a href="/MindsAI_Jack/">Jack Cole</a> <a href="/bayesilicon/">Michael Hodel</a>
Machine Learning Street Talk (@mlstreettalk) 's Twitter Profile Photo

I finally got to meet François Chollet in person recently to interview him about ARC Prize, intelligence vs memorization, human cognitive development, learning abstractions, limits of pattern recognition and consciousness development. These are the best bits. Full show released tomorrow

Andreas Köpf (@neurosp1ke) 's Twitter Profile Photo

Have been working on my 2nd synthetic ARC riddle generator (agent: ideation -> prog generation). Got >1k diverse generator+solver pairs as PoC so far. Some nice examples:

Have been working on my 2nd synthetic ARC riddle generator (agent: ideation -&gt; prog generation). Got &gt;1k diverse generator+solver pairs as PoC so far. Some nice examples:
Andreas Köpf (@neurosp1ke) 's Twitter Profile Photo

ARC prize 2024 🥈place paper by the ARChitects who scored 53.5 (56.5): github.com/da-fr/arc-priz… - Transformers/LLMs are for ARC what ConvNets were for Imagenet - strong base model, TTT, specialized datasets (e.g. Michael Hodel’s re-arc) + novel: DFS sampling with LLM critique

ARC prize 2024 🥈place paper by the ARChitects who scored 53.5 (56.5): github.com/da-fr/arc-priz…
- Transformers/LLMs are for ARC what ConvNets were for Imagenet
- strong base model, TTT, specialized datasets (e.g. <a href="/bayesilicon/">Michael Hodel</a>’s re-arc) + novel: DFS sampling with LLM critique
François Chollet (@fchollet) 's Twitter Profile Photo

Consulting my heart... Ok, looks like you haven't. But whenever you have a SotA (or close) solution built on top of the OpenAI API we're more than happy to verify it and add it to the public ARC Prize leaderboard. Anything using less than $10k worth of API calls is eligible.

Akira Yoshiyama ⁂ (@yoshiyama_akira) 's Twitter Profile Photo

Happy to announce we outperformed OpenAI o1 with a 7B model :) We released two self-improvement methods for verifiable domains in our preliminary paper -->

Happy to announce we outperformed <a href="/OpenAI/">OpenAI</a> o1 with a 7B model :)

We released two self-improvement methods for verifiable domains in our preliminary paper --&gt;
Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

Toby Simonds (@tobyrsimonds) 's Twitter Profile Photo

📝 New research: AlphaWrite applies evolutionary algorithms to creative writing. Inspired by AlphaEvolve, we use iterative generation + Elo ranking to systematically improve story quality through inference-time compute scaling. Results: 72% preference over baseline generation

📝 New research: AlphaWrite applies evolutionary algorithms to creative writing.

Inspired by AlphaEvolve, we use iterative generation + Elo ranking to systematically improve story quality through inference-time compute scaling.

Results: 72% preference over baseline generation