Hrant Khachatrian (@hrant25) 's Twitter Profile
Hrant Khachatrian

@hrant25

Training neural nets at @YerevaNN

ID: 212523120

linkhttps://yerevann.com calendar_today06-11-2010 09:37:56

384 Tweet

415 Followers

233 Following

Andrew White 🐦‍⬛ (@andrewwhite01) 's Twitter Profile Photo

The llama3 paper is nuts - so many interesting details. Like a table of why training was interrupted on the 24k GPUs. Or predicting benchmark performance prior to training. And data mix.

The llama3 paper is nuts - so many interesting details. Like a table of why training was interrupted on the 24k GPUs. Or predicting benchmark performance prior to training. And data mix.
Hrant Khachatrian (@hrant25) 's Twitter Profile Photo

If you are at ICML and you are interested in AI for molecules and drug discovery, please check out this poster at #ML4LMS workshop today!

Hrant Khachatrian (@hrant25) 's Twitter Profile Photo

The result of 1.5 years of work is out! We've built a corpus covering 100M+ molecules, learned to train 1-2B parameter language models, wrapped LMs into an optimization algorithm, and beaten all benchmarks we've tried. Everything is open-sourced! Please spread the word!

Hrant Khachatrian (@hrant25) 's Twitter Profile Photo

i didn't watch the video, but i agree. that's exactly what we are trying to do at YerevaNN. We showed LMs by themselves do not generate good molecules from the first prompt. But if there is another module that can score the generated molecules, the system becomes interesting!

Hrant Khachatrian (@hrant25) 's Twitter Profile Photo

Agree. A lot of weird problems can be modeled by next token prediction. And we know how to scale transformers to solve them.

Hrant Khachatrian (@hrant25) 's Twitter Profile Photo

Excited to work with Hrayr and YerevaNN team on fighting spurious correlations. This time inside the context of Transformers :)

Hrant Khachatrian (@hrant25) 's Twitter Profile Photo

What a great email with a slightly spicy endgame: "Good R&D endeavors can do more for progress in fundamental technologies than all the fancy theorizing that we often consider the "real" AI research."

Hrant Khachatrian (@hrant25) 's Twitter Profile Photo

So the gap between a frontier model by a major ClosedAI player and an equally powerful open-source model is less than 5 months now...