Armen Aghajanyan (@armenagha) 's Twitter Profile
Armen Aghajanyan

@armenagha

Co-founder & CEO @perceptroninc; ex-RS FAIR/MSFT

ID: 1515424688

calendar_today14-06-2013 05:43:07

648 Tweet

15,15K Followers

271 Following

Susan Zhang (@suchenzang) 's Twitter Profile Photo

arxiv.org/abs/2411.03923 "From Figure 3(a), it is apparent that many of the benchmarks we considered are substantially contaminated in the Llama 1 pre-training corpus as well as in the Pile. For 8 of the 13 datasets that we considered, on average more than 50% of the samples are

arxiv.org/abs/2411.03923

"From Figure 3(a), it is apparent that many of the benchmarks we considered are substantially contaminated in the Llama 1 pre-training corpus as well as in the Pile. For 8 of the 13 datasets that we considered, on average more than 50% of the samples are
Susan Zhang (@suchenzang) 's Twitter Profile Photo

if you're bragging about internal cultural dysfunction in your own lab, where ICs are pitted against each other in the name of some regional glory... it's no wonder your models are utterly irrelevant in 2025

Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

Only one of the teams between Zetta/LLaMa had an open-source pre-training codebase, shared datasets and experiments internally, used standardized evaluation sets, published internal notes and did things in the original spirit of FAIR. And the other team built in private, share

Jeremy Dohmann (@jecdohmann) 's Twitter Profile Photo

My first and only Twitter beef in my life was with that certain someone who hacked his prompts to top the benchmarks and then wouldn’t cooperate with open source researchers to reproduce them haha 🤔🤔

Soumith Chintala (@soumithchintala) 's Twitter Profile Photo

Yann LeCun Raw Daron Acemoglu you were/are the Chief Scientist of Meta, and a FAIR Lead -- where both Zetta and Llama were located; I think characterizing any team within your direct influence in a bad light in public is not nice. yea the Llama folks were great. praise them. What if Zetta was allowed to run

xjdr (@_xjdr) 's Twitter Profile Photo

man i forgot how good the metaseq repo was and how often i pulled stuff from there. also im team susan and armen if anyone was wondering

Armand Joulin (@armandjoulin) 's Twitter Profile Photo

Susan Zhang Armen Aghajanyan Susan Zhang and Stephen Roller should have been authors. There was a lot of discussion about it and imo, not including them was a mistake made by a people under a lot of pressure and frustration.

Armenian Patriarchate Of Jerusalem (@armenianquarter) 's Twitter Profile Photo

We desperately urge you to share yet another Urgent Communique. Targeted and crippling taxes against the religious entity of the Armenian Quarter of Jerusalem are being unjustly levied. This crushing threat can become a precedent set against ALL Christian Communities of Jerusalem

We desperately urge you to share yet another Urgent Communique. Targeted and crippling taxes against the religious entity of the Armenian Quarter of Jerusalem are being unjustly levied. This crushing threat can become a precedent set against ALL Christian Communities of Jerusalem
Maciej Kilian (@kilian_maciej) 's Twitter Profile Photo

fun debugging journey w/Akshat Shrivastava: be careful around FP8 w. activation checkpointing activation checkpointing works under the assumptions that different calls of forward give similar results which we move away from the more we quantize. when you re-quantize in activation

fun debugging journey w/<a href="/AkshatS07/">Akshat Shrivastava</a>: be careful around FP8 w. activation checkpointing

activation checkpointing works under the assumptions that different calls of forward give similar results which we move away from the more we quantize. when you re-quantize in activation
Maciej Kilian (@kilian_maciej) 's Twitter Profile Photo

Maksymilian Wojnar and I have been playing around with tensor alignments in neural networks. here’s a summary of our exploration. we go into neural net parameterizations, measuring tensor alignments, and we develop a dynamic maximal learning rate scheduler which factors in alignment

<a href="/mm_wojnar/">Maksymilian Wojnar</a> and I have been playing around with tensor alignments in neural networks. here’s a summary of our exploration. we go into neural net parameterizations, measuring tensor alignments, and we develop a dynamic maximal learning rate scheduler which factors in alignment
Akshat Shrivastava (@akshats07) 's Twitter Profile Photo

Excited to see further studies into early fusion vs late fusion models, in particular a great analysis into multimodal MoE’s aligned with our findings in MoMa on designing parameter specialization in multimodal LLMs. A few key things that helped us on top of the results presented

Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs. We performed the most comprehensive study on training-free sparse attention to date. Here is what we found:

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs.

We performed the most comprehensive study on training-free sparse attention to date.

Here is what we found:
Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

This whole part of the interview was pure cope. DeepSeek has the proper talent and alignment do both proper engineering and science. Saying that Meta would rather focus on multi-modal over building out the proper foundations feels like something told to Zuck to justify poor