
Xinpeng Wang
@xinpengwang_
Visitor @NYUDataScience | PhD student @LMU_Muenchen | Eval & Safety Alignment | Previously @TU_Muenchen
ID: 1225082179476246528
http://xinpeng-wang.github.io 05-02-2020 15:42:16
43 Tweet
184 Followers
354 Following

Sebastian Raschka Thank you for sharing this fascinating work! Our study (arxiv.org/abs/2405.18218), released a month prior to this work, already reveals the redundancy of attention layers. In our research, we applied iterative pruning to both attention and feed-forward layers, and our experiments




