Umberto Cappellazzo (@umberto_senpai) 's Twitter Profile
Umberto Cappellazzo

@umberto_senpai

Research Associate @ Imperial College London. Interests: efficient scaling of audio-visual LLMs, Mixture of Experts.

ID: 448124299

linkhttps://umbertocappellazzo.github.io/ calendar_today27-12-2011 17:02:49

12,12K Tweet

433 Followers

177 Following

arXiv Sound (@arxivsound) 's Twitter Profile Photo

``Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs,'' Umberto Cappellazzo, Minsu Kim, Stavros Petridis, ift.tt/yq2ArSR

Umberto Cappellazzo (@umberto_senpai) 's Twitter Profile Photo

Stoked to present Llama-Matryoshka🪆, a versatile audio-visual MLLM capable of elastic inference across multiple tasks and computational resources. Multi-granularity audio-visual representations + (new) LoRA Matryoshka modules!⚡ Paper: arxiv.org/abs/2503.06362

Stoked to present Llama-Matryoshka🪆, a versatile audio-visual MLLM capable of elastic inference across multiple tasks and computational resources. Multi-granularity audio-visual representations + (new) LoRA Matryoshka modules!⚡

Paper: arxiv.org/abs/2503.06362
Nathan Lambert (@natolambert) 's Twitter Profile Photo

A very exciting day for open-source AI! We're releasing our biggest open source model yet -- OLMo 2 32B -- and it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. As usual, all data, weights, code, etc. are available. For a long time,

A very exciting day for open-source AI! We're releasing our biggest open source model yet -- OLMo 2 32B -- and it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. As usual, all data, weights, code, etc. are available.

For a long time,
Umberto Cappellazzo (@umberto_senpai) 's Twitter Profile Photo

In the last few weeks, I noticed multiple surveys on LLM post-training and reasoning have been released (with great timeline diagrams!). Below the links to all of them! 1) arxiv.org/pdf/2502.17419 2) arxiv.org/pdf/2502.21321 3) arxiv.org/pdf/2503.06072 4) arxiv.org/pdf/2503.09567

In the last few weeks, I noticed multiple surveys on LLM post-training and reasoning have been released (with great timeline diagrams!). Below the links to all of them!

1) arxiv.org/pdf/2502.17419
2) arxiv.org/pdf/2502.21321
3) arxiv.org/pdf/2503.06072
4) arxiv.org/pdf/2503.09567
Umberto Cappellazzo (@umberto_senpai) 's Twitter Profile Photo

If you're attending IEEE ICASSP and are curious about how to efficiently leverage LLMs for audio-visual speech recognition, don't miss our paper! Stop by Poster 2F area at 5pm local time, Minsu will be there to dive into the details with you! 🔍

If you're attending <a href="/ieeeICASSP/">IEEE ICASSP</a> and are curious about how to efficiently leverage LLMs for audio-visual speech recognition, don't miss our paper! Stop by Poster 2F area at 5pm local time, Minsu will be there to dive into the details with you! 🔍
Umberto Cappellazzo (@umberto_senpai) 's Twitter Profile Photo

Nice to see the Matryoshka principle adopted in the new Gemma models! Matryoshka can also be applied at the token sequence level, without modifying the model, to enable elastic inference. It pairs well with LoRA for efficient fine-tuning too. More info: arxiv.org/abs/2503.06362

Umberto Cappellazzo (@umberto_senpai) 's Twitter Profile Photo

"Scaling and Enhancing LLM-based AVSR:A Sparse Mixture of Projectors Approach" has been accepted to Interspeech 2025!🇳🇱 Full paper: arxiv.org/pdf/2505.14336 We can effectively apply MoE to projector layers for AVSR (i.e., Llama-SMoP). Simple, efficient, and model-agnostic! Yay!

"Scaling and Enhancing LLM-based AVSR:A Sparse Mixture of Projectors Approach" has been accepted to Interspeech 2025!🇳🇱 Full paper: arxiv.org/pdf/2505.14336 
We can effectively apply MoE to projector layers for AVSR (i.e., Llama-SMoP). Simple, efficient, and model-agnostic! Yay!
Umberto Cappellazzo (@umberto_senpai) 's Twitter Profile Photo

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning GraLoRA partitions weight matrices into sub-blocks, each with its own low-rank adapter to avoid subpar performance of LoRA at high ranks (i.e., > 64). Interesting paper! arxiv.org/pdf/2505.20355

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning  

GraLoRA partitions weight matrices into sub-blocks, each with its own low-rank adapter to avoid subpar performance of LoRA at high ranks (i.e., &gt; 64).   

Interesting paper! 

arxiv.org/pdf/2505.20355
Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
Nick Jiang @ ICLR (@nickhjiang) 's Twitter Profile Photo

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵
Sukjun (June) Hwang (@sukjun_hwang) 's Twitter Profile Photo

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

Umberto Cappellazzo (@umberto_senpai) 's Twitter Profile Photo

Our paper on Llama-Matryoshka🪆has been accepted to IEEE ASRU 2025! IEEE ASRU One model, elastic inference, and strong performance across multiple tasks via matryoshka representation learning. 📄 Camera-ready:arxiv.org/abs/2503.06362 🌺 See you this December in Honolulu, Hawaii