Armen Aghajanyan (@armenagha) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

arxiv.org/abs/2411.03923 "From Figure 3(a), it is apparent that many of the benchmarks we considered are substantially contaminated in the Llama 1 pre-training corpus as well as in the Pile. For 8 of the 13 datasets that we considered, on average more than 50% of the samples are

thumb_up_off_alt253

chat_bubble_outline5

repeat16

shareShare

Susan Zhang

@suchenzang

5 months ago

the true bitter lesson: it's easier to lie, cheat, and steal than it is to actually do good work

thumb_up_off_alt922

chat_bubble_outline31

repeat24

shareShare

Susan Zhang

@suchenzang

5 months ago

if you're bragging about internal cultural dysfunction in your own lab, where ICs are pitted against each other in the name of some regional glory... it's no wonder your models are utterly irrelevant in 2025

thumb_up_off_alt308

chat_bubble_outline5

repeat6

shareShare

Armen Aghajanyan

@armenagha

5 months ago

Only one of the teams between Zetta/LLaMa had an open-source pre-training codebase, shared datasets and experiments internally, used standardized evaluation sets, published internal notes and did things in the original spirit of FAIR. And the other team built in private, share

thumb_up_off_alt513

chat_bubble_outline11

repeat21

shareShare

Jeremy Dohmann

@jecdohmann

5 months ago

My first and only Twitter beef in my life was with that certain someone who hacked his prompts to top the benchmarks and then wouldn’t cooperate with open source researchers to reproduce them haha 🤔🤔

thumb_up_off_alt55

chat_bubble_outline1

repeat3

shareShare

Armen Aghajanyan

@armenagha

5 months ago

Really curious which model this was...

thumb_up_off_alt15

chat_bubble_outline0

repeat0

shareShare

Soumith Chintala

@soumithchintala

5 months ago

Yann LeCun Raw Daron Acemoglu you were/are the Chief Scientist of Meta, and a FAIR Lead -- where both Zetta and Llama were located; I think characterizing any team within your direct influence in a bad light in public is not nice. yea the Llama folks were great. praise them. What if Zetta was allowed to run

thumb_up_off_alt448

chat_bubble_outline9

repeat7

shareShare

Susan Zhang

@suchenzang

5 months ago

since a godfather of AI is looking at codebase receipts, here one: github.com/facebookresear… in arxiv.org/abs/2302.13971

thumb_up_off_alt190

chat_bubble_outline5

repeat9

shareShare

Armen Aghajanyan

@armenagha

5 months ago

Surely they added Susan Zhang as an author to LLaMa right?

thumb_up_off_alt135

chat_bubble_outline2

repeat0

shareShare

xjdr

@_xjdr

5 months ago

man i forgot how good the metaseq repo was and how often i pulled stuff from there. also im team susan and armen if anyone was wondering

thumb_up_off_alt96

chat_bubble_outline1

repeat2

shareShare

Armand Joulin

@armandjoulin

5 months ago

Susan Zhang Armen Aghajanyan Susan Zhang and Stephen Roller should have been authors. There was a lot of discussion about it and imo, not including them was a mistake made by a people under a lot of pressure and frustration.

thumb_up_off_alt32

chat_bubble_outline1

repeat2

shareShare

Armenian Patriarchate Of Jerusalem

@armenianquarter

5 months ago

We desperately urge you to share yet another Urgent Communique. Targeted and crippling taxes against the religious entity of the Armenian Quarter of Jerusalem are being unjustly levied. This crushing threat can become a precedent set against ALL Christian Communities of Jerusalem

thumb_up_off_alt4,4K

chat_bubble_outline159

repeat2,2K

shareShare

Maciej Kilian

@kilian_maciej

3 months ago

fun debugging journey w/Akshat Shrivastava: be careful around FP8 w. activation checkpointing activation checkpointing works under the assumptions that different calls of forward give similar results which we move away from the more we quantize. when you re-quantize in activation

fun debugging journey w/<a href="/AkshatS07/">Akshat Shrivastava</a>: be careful around FP8 w. activation checkpointing

activation checkpointing works under the assumptions that different calls of forward give similar results which we move away from the more we quantize. when you re-quantize in activation

thumb_up_off_alt77

chat_bubble_outline4

repeat11

shareShare

Maciej Kilian

@kilian_maciej

3 months ago

stay with me now

thumb_up_off_alt21

chat_bubble_outline0

repeat4

shareShare

Maciej Kilian

@kilian_maciej

3 months ago

Maksymilian Wojnar and I have been playing around with tensor alignments in neural networks. here’s a summary of our exploration. we go into neural net parameterizations, measuring tensor alignments, and we develop a dynamic maximal learning rate scheduler which factors in alignment

<a href="/mm_wojnar/">Maksymilian Wojnar</a> and I have been playing around with tensor alignments in neural networks. here’s a summary of our exploration. we go into neural net parameterizations, measuring tensor alignments, and we develop a dynamic maximal learning rate scheduler which factors in alignment

thumb_up_off_alt9

chat_bubble_outline2

repeat4

shareShare

Akshat Shrivastava

@akshats07

3 months ago

Excited to see further studies into early fusion vs late fusion models, in particular a great analysis into multimodal MoE’s aligned with our findings in MoMa on designing parameter specialization in multimodal LLMs. A few key things that helped us on top of the results presented

thumb_up_off_alt37

chat_bubble_outline1

repeat8

shareShare

Piotr Nawrot

@p_nawrot

3 months ago

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs. We performed the most comprehensive study on training-free sparse attention to date. Here is what we found:

thumb_up_off_alt596

chat_bubble_outline5

repeat102

shareShare

Armen Aghajanyan

@armenagha

3 months ago

This whole part of the interview was pure cope. DeepSeek has the proper talent and alignment do both proper engineering and science. Saying that Meta would rather focus on multi-modal over building out the proper foundations feels like something told to Zuck to justify poor

thumb_up_off_alt491

chat_bubble_outline21

repeat23

shareShare

Maciej Kilian

@kilian_maciej

17 days ago

the nasty thing here is this only revealed at large batch size for very clean data - in all other runs it wasn't detectable.

thumb_up_off_alt7

chat_bubble_outline2

repeat1

shareShare