Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

``Fast Text-to-Audio Generation with Adversarial Post-Training,'' Zachary Novack, Zach Evans, Zack Zukowski, Josiah Taylor, CJ Carr, Julian Parker, Adnan Al-Sinan, Gian Marco Iodice, Julian McAuley, Taylor Berg-Kirkpatrick, Jordi Pons, ift.tt/jUIQPW5

thumb_up_off_alt25

chat_bubble_outline0

repeat4

shareShare

Stability AI

@stabilityai

2 months ago

Today we’re open-sourcing Stable Audio Open Small, a 341M-parameter text-to-audio model optimized to run entirely on Arm CPUs. This means 99% of smartphones can now generate music-production samples in seconds, right on-device with no internet required. Built for fast,

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat195

shareShare

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

2 months ago

The checklist bureaucracy creep is real

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Niloofar (on faculty job market!)

@niloofar_mire

2 months ago

We (w Zachary Novack Jaechul Roh et al.) are working on #memorization in #audio models & are conducting a human study on generated #music similarity. Please help us out by taking our short listening test (available in English, Mandarin & Cantonese). You can do more than one! Link ⬇️

thumb_up_off_alt34

chat_bubble_outline2

repeat7

shareShare

dadabots

@dadabots

a month ago

yup, just compiled it & tested. Stable Audio Open Small runs faster than realtime on a mac **CPU** on a m1 chip you have three ways to process- a cpu, gpu, and a neural engine. for ai stuff, the CPU is the SLOW one. It’s realtime on THAT.

thumb_up_off_alt76

chat_bubble_outline4

repeat8

shareShare

arXiv Sound

@arxivsound

a month ago

``A Review on Score-based Generative Models for Audio Applications,'' Ge Zhu, Yutong Wen, Zhiyao Duan, ift.tt/N9OgiZd

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

a month ago

Presenting RUListening! we edit Music-QA benchmarks to *actually* assess audio perception, using text-only LLMs to generate unimodally-hard distractors. Been super excited about this one (led by the beast Yongyi Zang), check out the full thread below! And at ISMIR 2025!🇰🇷

thumb_up_off_alt24

chat_bubble_outline0

repeat5

shareShare

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

a month ago

OSSL Dataset is out and accepted at #ISMIR2025 🇰🇷! High quality soundtrack+movie paired data, all public domain, perfect for your V2M tasks 📽️🎶 Led by the titan Haven Kim, check out the full thread below for more info!

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

arXiv Sound

@arxivsound

a month ago

``Video-Guided Text-to-Music Generation Using Public Domain Movie Collections,'' Haven Kim, Zachary Novack, Weihan Xu, Julian McAuley, Hao-Wen Dong, ift.tt/iOzRbwg

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

thecollabagepatch

@thepatch_kev

a month ago

live coding with stable audio open small? let the vibes begin lol... i love having a bunch endpoints already functioning so this swift app can start becoming whatever tf it wants to be next up, the instruments-only and style transfer endpoints

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

thecollabagepatch

@thepatch_kev

a month ago

stable audio open small is great for stacking multiple generations Zachary Novack lyra bubbles~ ♪❀ the ux speriments continue. changing instrument gen during playback can be pretty jarring tho but methinks style-transfer endpoint may come in handy finetunes might make this glorious fun

thumb_up_off_alt12

chat_bubble_outline1

repeat3

shareShare

Yusong Wu

@wuyusongwys

21 days ago

It’s been a thrilling journey building FLAM! 🚀 Super proud of what we achieved open‑vocabulary audio event detection using calibrated frame‑wise modeling. FLAM will be presented at ICML 2025, come check it out! 📄 Paper: arxiv.org/abs/2505.05393 🎧 Demo: flam-model.github.io

thumb_up_off_alt65

chat_bubble_outline4

repeat11

shareShare

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

18 days ago

I always like those paper/author visualizations for other conferences, so I ~vibe coded~ up an interactive one for #ISMIR2025 ISMIR Conference ! Go check it out at: zacharynovack.github.io/ismir2025.html Will hopefully add paper links and other metadata in the coming weeks :)

thumb_up_off_alt29

chat_bubble_outline0

repeat5

shareShare

Hao-Wen (Herman) Dong 董皓文

@hermanhwdong

18 days ago

Happy to share that our paper led by Haven Kim has been accepted to #ISMIR2025! 🎉 🎬We presented the OSSL dataset of 736 public domain movie clips (36.5hr in total) with soundtracks. 🔍We explored video-guided text-to-music generation using OSSL. youtu.be/DjBVqhErShM

thumb_up_off_alt29

chat_bubble_outline1

repeat5

shareShare

Morteza Mardani

@mardanimorteza

15 days ago

📢📢 Elucidated Rolling Diffusion Models (ERDM) How can we stably roll out diffusion models for sequence generation in data-scarce dynamical systems? We elucidate the design of rolling diffusion, inspired by prob. flow ODEs and nonisotropic noise. 📄 arxiv.org/pdf/2506.20024

thumb_up_off_alt112

chat_bubble_outline2

repeat23

shareShare

Yupeng Hou

@yupenghou97

15 days ago

Did you know tokenization for generative recommendation today looks a lot like LLM tokenization did *10 years* ago? Meet ActionPiece, our #ICML2025 Spotlight paper, the first context-aware action tokenizer. 1/5 🧵

thumb_up_off_alt118

chat_bubble_outline1

repeat27

shareShare

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

12 days ago

Stable Audio Open Small is accepted at #WASPAA2025 IEEE WASPAA 2025 ! Can't wait to share the latest in blazingly fast, on-device text-to-audio in Lake Tahoe 🏞️

thumb_up_off_alt66

chat_bubble_outline4

repeat10

shareShare

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

7 days ago

We're organizing the AI for Music workshop at NeurIPS Conference in San Diego! We'll be accepting both papers + demos w/an initial deadline of August 22, well timed for early visibility on your ICASSP/ICLR drafts 👀 Check out the website for more: aiformusicworkshop.github.io

thumb_up_off_alt53

chat_bubble_outline3

repeat12

shareShare

thecollabagepatch

@thepatch_kev

2 days ago

made a Hugging Face space for custom sample generation using stable-audio-open-small. already had an api in my backend, so figured i should make a @gradio app for the looping stuff. combine drums+instruments then transform w/melodyflow link 👇

thumb_up_off_alt26

chat_bubble_outline4

repeat3

shareShare

Xun Huang

@xunhuang1995

2 days ago

We should have called it "scaling up rollout", not RL. RL is a necessary evil for the discrete nature of language. My intuition tells me using RL for continuous data (images, videos, audios), where differentiable supervision is easily available, is a terrible idea.

thumb_up_off_alt301

chat_bubble_outline4

repeat16

shareShare