WAVLab | @CarnegieMellon (@wavlab) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Happy New Year! Last year, our group published over 60 papers (ICASSPx22, Interspeechx24, SLTx10, etc.)! I'm very happy to work with my great colleagues. Thanks, everyone! (Note that I do not include some technical reports, arXiv, and workshop/challenge papers.)

thumb_up_off_alt83

chat_bubble_outline0

repeat7

shareShare

Siddhant Arora

@sid_arora_18

5 months ago

🚀 New #ICLR2025 Paper Alert! 🚀 Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊 We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇 📜: arxiv.org/abs/2503.01174

thumb_up_off_alt46

chat_bubble_outline3

repeat11

shareShare

Siddhant Arora

@sid_arora_18

4 months ago

New #NAACL2025 demo, Excited to introduce ESPnet-SDS, a new open-source toolkit for building unified web interfaces for both cascaded & end-to-end spoken dialogue system, providing real-time evaluation, and more! 📜: arxiv.org/abs/2503.08533 Live Demo: huggingface.co/spaces/Siddhan…

thumb_up_off_alt34

chat_bubble_outline1

repeat13

shareShare

Brian Yan

@brianyan918

4 months ago

Multilingual speech recognition systems (e.g. Whisper) are not as good as you may think! Performance in the lab, where language ID is known, is inflated compared to in the wild, where language ID is predicted - it's an error propagation issue. Paper: arxiv.org/pdf/2409.18428 1/N

thumb_up_off_alt34

chat_bubble_outline1

repeat7

shareShare

Brian Yan

@brianyan918

4 months ago

Using simple N-best re-ranking, we improved Whisper and MMS ASR performance in the wild by 2-4%. These improvements are entirely driven by fixing lang ID errors which disproportionately impact tail langs. Please check our paper and #ICASSP2025 talk on April 10th for more! 5/5

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Siddhant Arora

@sid_arora_18

3 months ago

Excited to share our new survey on Spoken Language Models! We present a comprehensive taxonomy and analysis of Spoken Language Models as the field moves toward universal speech processing systems. Covers architectures, training strategies, evaluation metrics and key challenges!

thumb_up_off_alt37

chat_bubble_outline0

repeat6

shareShare

Masao

@mmiagshatoy

3 months ago

Happy to share our #ICLR2025 paper: "Context-Aware Dynamic Pruning for Speech Foundation Models" 🎉 💡 We introduce context-aware inference-time pruning. 🎯 On Speech Translation (ST), it cuts inference time by 34% (relative) with no drop in BLEU. 📄 openreview.net/forum?id=u2QdC…

thumb_up_off_alt43

chat_bubble_outline0

repeat12

shareShare

Shinji Watanabe

@shinjiw_at_cmu

3 months ago

Just wrapped up the 3rd year of my Speech Technology for Conversational AI course! Students built speech-to-speech interfaces for their final projects—15 amazing presentations this year! Good job!

thumb_up_off_alt82

chat_bubble_outline0

repeat7

shareShare

WAVLab | @CarnegieMellon

@wavlab

3 months ago

📢 Introducing VERSA: our open-source toolkit for speech & audio evaluation! 🧩 80+ metrics in one interface 🔧 Built with software-level excellence 🤝 Designed to be community-driven 🔍 Expanding to audio profiling & meta-info Have metrics you want added? Let us know!

thumb_up_off_alt20

chat_bubble_outline0

repeat4

shareShare

Kwanghee Choi

@juice500ml

3 months ago

Can self-supervised models 🤖 understand allophony 🗣? Excited to share my new #NAACL2025 paper: Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment arxiv.org/abs/2502.07029 (1/n)

thumb_up_off_alt54

chat_bubble_outline1

repeat17

shareShare

Kwanghee Choi

@juice500ml

3 months ago

Check out my presentation and poster for more details. I'll see you at NAACL, 4/30 14:00-15:30 Poster Session C! youtu.be/ZRF4u1eThJM (9/9)

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

jiatongshi

@jiatongshi

3 months ago

I’m excited to be presenting my poster on VERSA at #NAACL2025! 📍 Hall 3, Session D 📅 Wed, Apr 30 | 16:00–17:30 I’ll walk you through how VERSA brings 80+ unified speech/audio evaluation metrics into one toolkit. Stop by for a live demo and chat about benchmarking your models.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

William Chen

@chenwanch1

3 months ago

What happens if you scale Whisper to billions of parameters? Our #ICML2025 paper develops scaling laws for ASR/ST models, training models with up to 18B params and 360K hours of data, and 100+ languages Joint work b/w Language Technologies Institute | @CarnegieMellon and NVIDIA arxiv.org/abs/2502.10373

thumb_up_off_alt90

chat_bubble_outline4

repeat29

shareShare

William Chen

@chenwanch1

2 months ago

7/7 papers accepted to #Interspeech2025 🎉 Lots of interesting work from my fantastic co-authors on long-form processing, multilingualism, and multi-modal foundation models. See y’all in Rotterdam 🇳🇱

thumb_up_off_alt79

chat_bubble_outline4

repeat7

shareShare

jiatongshi

@jiatongshi

2 months ago

7/7 paper accepted to #Interspeech. Topic ranging from speech evaluation, data annotation/collection, SVC, multlinguality, speechLMs. Look forward to sharing all of them soon!

thumb_up_off_alt28

chat_bubble_outline4

repeat4

shareShare

Shinji Watanabe

@shinjiw_at_cmu

2 months ago

🚀 New at ASRU’25: The Demo/System/Data Track is now revamped! Accepted papers will be published in the official proceedings & IEEE Xplore. Great chance for industry & applied researchers to share real-world ASR/SLU work 🗓️ Deadline: June 25 🔗 2025.ieeeasru.org/calls/call-for… IEEE ASRU

thumb_up_off_alt26

chat_bubble_outline0

repeat11

shareShare

Masao

@mmiagshatoy

2 months ago

🚀 Happy to share our #INTERSPEECH2025 paper: Using speaker & acoustic context, we dynamically　adjust model paths, resulting in a 25.7% relative BLEU improvement in speech translation. We also analyze how context influences model behavior. 📜 Paper: arxiv.org/abs/2505.18860

thumb_up_off_alt29

chat_bubble_outline1

repeat10

shareShare

Masao

@mmiagshatoy

2 months ago

3/3 of my first time Interspeech submissions got accepted ⸜( ' ᵕ ' )⸝ ﾔｯﾀｰ! 1 as first author, 1 shared first authorship with a colleague, and 1 as co-author. See you in Rotterdam!

thumb_up_off_alt23

chat_bubble_outline1

repeat1

shareShare

Shinji Watanabe

@shinjiw_at_cmu

2 months ago

22 papers were accepted at Interspeech'25! See you soon in Rotterdam!

thumb_up_off_alt152

chat_bubble_outline8

repeat11

shareShare

jiatongshi

@jiatongshi

a month ago

🔊 New release: #ARECHO -> Autoregressive Evaluation via Chain-based Hypothesis Optimization. • 87-metric coverage in one model 🧮 • Dynamic classifier chain 🤝 • Unified tokenization 🧩 • Confidence-aware decoding 🛡️ Built on #UniVERSA, heading to #VERSA. More ↓

thumb_up_off_alt11

chat_bubble_outline1

repeat3

shareShare