WAVLab | @CarnegieMellon (@wavlab) 's Twitter Profile
WAVLab | @CarnegieMellon

@wavlab

Shinji Watanabe's Audio and Voice Lab | WAVLab @LTIatCMU @SCSatCMU | Speech Recognition, Speech Enhancement, Spoken Language Understanding, and more.

ID: 1425312010020003840

linkhttps://shinjiwlab.github.io/ calendar_today11-08-2021 04:26:22

270 Tweet

2,2K Followers

130 Following

Shinji Watanabe (@shinjiw_at_cmu) 's Twitter Profile Photo

Happy New Year! Last year, our group published over 60 papers (ICASSPx22, Interspeechx24, SLTx10, etc.)! I'm very happy to work with my great colleagues. Thanks, everyone! (Note that I do not include some technical reports, arXiv, and workshop/challenge papers.)

Siddhant Arora (@sid_arora_18) 's Twitter Profile Photo

🚀 New #ICLR2025 Paper Alert! 🚀 Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊 We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇 📜: arxiv.org/abs/2503.01174

🚀 New #ICLR2025 Paper Alert! 🚀

Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊

We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇

📜: arxiv.org/abs/2503.01174
Siddhant Arora (@sid_arora_18) 's Twitter Profile Photo

New #NAACL2025 demo, Excited to introduce ESPnet-SDS, a new open-source toolkit for building unified web interfaces for both cascaded & end-to-end spoken dialogue system, providing real-time evaluation, and more! 📜: arxiv.org/abs/2503.08533 Live Demo: huggingface.co/spaces/Siddhan…

New #NAACL2025 demo, Excited to introduce ESPnet-SDS, a new open-source toolkit for building unified web interfaces for both cascaded & end-to-end spoken dialogue system, providing real-time evaluation, and more!
📜: arxiv.org/abs/2503.08533
Live Demo: huggingface.co/spaces/Siddhan…
Brian Yan (@brianyan918) 's Twitter Profile Photo

Multilingual speech recognition systems (e.g. Whisper) are not as good as you may think! Performance in the lab, where language ID is known, is inflated compared to in the wild, where language ID is predicted - it's an error propagation issue. Paper: arxiv.org/pdf/2409.18428 1/N

Multilingual speech recognition systems (e.g. Whisper) are not as good as you may think!

Performance in the lab, where language ID is known, is inflated compared to in the wild, where language ID is predicted - it's an error propagation issue.

Paper: arxiv.org/pdf/2409.18428
1/N
Brian Yan (@brianyan918) 's Twitter Profile Photo

Using simple N-best re-ranking, we improved Whisper and MMS ASR performance in the wild by 2-4%. These improvements are entirely driven by fixing lang ID errors which disproportionately impact tail langs. Please check our paper and #ICASSP2025 talk on April 10th for more! 5/5

Using simple N-best re-ranking, we improved Whisper and MMS ASR performance in the wild by 2-4%.

These improvements are entirely driven by fixing lang ID errors which disproportionately impact tail langs.

Please check our paper and #ICASSP2025 talk on April 10th for more!
5/5
Siddhant Arora (@sid_arora_18) 's Twitter Profile Photo

Excited to share our new survey on Spoken Language Models! We present a comprehensive taxonomy and analysis of Spoken Language Models as the field moves toward universal speech processing systems. Covers architectures, training strategies, evaluation metrics and key challenges!

Masao (@mmiagshatoy) 's Twitter Profile Photo

Happy to share our #ICLR2025 paper: "Context-Aware Dynamic Pruning for Speech Foundation Models" 🎉 💡 We introduce context-aware inference-time pruning. 🎯 On Speech Translation (ST), it cuts inference time by 34% (relative) with no drop in BLEU. 📄 openreview.net/forum?id=u2QdC…

Happy to share our #ICLR2025 paper:
"Context-Aware Dynamic Pruning for Speech Foundation Models" 🎉

💡 We introduce context-aware inference-time pruning.
🎯 On Speech Translation (ST), it cuts inference time by 34% (relative) with no drop in BLEU.

📄 openreview.net/forum?id=u2QdC…
Shinji Watanabe (@shinjiw_at_cmu) 's Twitter Profile Photo

Just wrapped up the 3rd year of my Speech Technology for Conversational AI course! Students built speech-to-speech interfaces for their final projects—15 amazing presentations this year! Good job!

Just wrapped up the 3rd year of my Speech Technology for Conversational AI course! Students built speech-to-speech interfaces for their final projects—15 amazing presentations this year! Good job!
WAVLab | @CarnegieMellon (@wavlab) 's Twitter Profile Photo

📢 Introducing VERSA: our open-source toolkit for speech & audio evaluation! 🧩 80+ metrics in one interface 🔧 Built with software-level excellence 🤝 Designed to be community-driven 🔍 Expanding to audio profiling & meta-info Have metrics you want added? Let us know!

Kwanghee Choi (@juice500ml) 's Twitter Profile Photo

Can self-supervised models 🤖 understand allophony 🗣? Excited to share my new #NAACL2025 paper: Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment arxiv.org/abs/2502.07029 (1/n)

Can self-supervised models 🤖 understand allophony 🗣? Excited to share my new #NAACL2025 paper: Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment arxiv.org/abs/2502.07029 (1/n)
Kwanghee Choi (@juice500ml) 's Twitter Profile Photo

Check out my presentation and poster for more details. I'll see you at NAACL, 4/30 14:00-15:30 Poster Session C! youtu.be/ZRF4u1eThJM (9/9)

Check out my presentation and poster for more details. I'll see you at NAACL, 4/30 14:00-15:30 Poster Session C! youtu.be/ZRF4u1eThJM (9/9)
jiatongshi (@jiatongshi) 's Twitter Profile Photo

I’m excited to be presenting my poster on VERSA at #NAACL2025! 📍 Hall 3, Session D 📅 Wed, Apr 30 | 16:00–17:30 I’ll walk you through how VERSA brings 80+ unified speech/audio evaluation metrics into one toolkit. Stop by for a live demo and chat about benchmarking your models.

William Chen (@chenwanch1) 's Twitter Profile Photo

What happens if you scale Whisper to billions of parameters? Our #ICML2025 paper develops scaling laws for ASR/ST models, training models with up to 18B params and 360K hours of data, and 100+ languages Joint work b/w Language Technologies Institute | @CarnegieMellon and NVIDIA arxiv.org/abs/2502.10373

What happens if you scale Whisper to billions of parameters?

Our #ICML2025 paper develops scaling laws for ASR/ST models, training models with up to 18B params and 360K hours of data, and 100+ languages

Joint work b/w <a href="/LTIatCMU/">Language Technologies Institute | @CarnegieMellon</a> and <a href="/nvidia/">NVIDIA</a>

arxiv.org/abs/2502.10373
William Chen (@chenwanch1) 's Twitter Profile Photo

7/7 papers accepted to #Interspeech2025 🎉 Lots of interesting work from my fantastic co-authors on long-form processing, multilingualism, and multi-modal foundation models. See y’all in Rotterdam 🇳🇱

jiatongshi (@jiatongshi) 's Twitter Profile Photo

7/7 paper accepted to #Interspeech. Topic ranging from speech evaluation, data annotation/collection, SVC, multlinguality, speechLMs. Look forward to sharing all of them soon!

Shinji Watanabe (@shinjiw_at_cmu) 's Twitter Profile Photo

🚀 New at ASRU’25: The Demo/System/Data Track is now revamped! Accepted papers will be published in the official proceedings & IEEE Xplore. Great chance for industry & applied researchers to share real-world ASR/SLU work 🗓️ Deadline: June 25 🔗 2025.ieeeasru.org/calls/call-for… IEEE ASRU

Masao (@mmiagshatoy) 's Twitter Profile Photo

🚀 Happy to share our #INTERSPEECH2025 paper: Using speaker & acoustic context, we dynamically adjust model paths, resulting in a 25.7% relative BLEU improvement in speech translation. We also analyze how context influences model behavior. 📜 Paper: arxiv.org/abs/2505.18860

Masao (@mmiagshatoy) 's Twitter Profile Photo

3/3 of my first time Interspeech submissions got accepted ⸜( ' ᵕ ' )⸝ ヤッター! 1 as first author, 1 shared first authorship with a colleague, and 1 as co-author. See you in Rotterdam!

jiatongshi (@jiatongshi) 's Twitter Profile Photo

🔊 New release: #ARECHO -> Autoregressive Evaluation via Chain-based Hypothesis Optimization. • 87-metric coverage in one model 🧮 • Dynamic classifier chain 🤝 • Unified tokenization 🧩 • Confidence-aware decoding 🛡️ Built on #UniVERSA, heading to #VERSA. More ↓