Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile
Zachary Novack @ICLR2025 🇸🇬

@zacknovack

efficient + controllable music generation | phd-ing @ucsd_cse | research intern @stabilityai | prev @adoberesearch @acmi_lab | teaching drums @pulsepercussion

ID: 1534894045805281280

linkhttp://zacharynovack.github.io calendar_today09-06-2022 13:44:14

228 Tweet

572 Followers

471 Following

arXiv Sound (@arxivsound) 's Twitter Profile Photo

``Fast Text-to-Audio Generation with Adversarial Post-Training,'' Zachary Novack, Zach Evans, Zack Zukowski, Josiah Taylor, CJ Carr, Julian Parker, Adnan Al-Sinan, Gian Marco Iodice, Julian McAuley, Taylor Berg-Kirkpatrick, Jordi Pons, ift.tt/jUIQPW5

Stability AI (@stabilityai) 's Twitter Profile Photo

Today we’re open-sourcing Stable Audio Open Small, a 341M-parameter text-to-audio model optimized to run entirely on Arm CPUs. This means 99% of smartphones can now generate music-production samples in seconds, right on-device with no internet required. Built for fast,

Niloofar (on faculty job market!) (@niloofar_mire) 's Twitter Profile Photo

We (w Zachary Novack Jaechul Roh et al.) are working on #memorization in #audio models & are conducting a human study on generated #music similarity. Please help us out by taking our short listening test (available in English, Mandarin & Cantonese). You can do more than one! Link ⬇️

dadabots (@dadabots) 's Twitter Profile Photo

yup, just compiled it & tested. Stable Audio Open Small runs faster than realtime on a mac **CPU** on a m1 chip you have three ways to process- a cpu, gpu, and a neural engine. for ai stuff, the CPU is the SLOW one. It’s realtime on THAT.

yup, just compiled it & tested. Stable Audio Open Small runs faster than realtime on a mac **CPU**

on a m1 chip you have three ways to process- a cpu, gpu, and a neural engine. for ai stuff, the CPU is the SLOW one. It’s realtime on THAT.
Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

Presenting RUListening! we edit Music-QA benchmarks to *actually* assess audio perception, using text-only LLMs to generate unimodally-hard distractors. Been super excited about this one (led by the beast Yongyi Zang), check out the full thread below! And at ISMIR 2025!🇰🇷

Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

OSSL Dataset is out and accepted at #ISMIR2025 🇰🇷! High quality soundtrack+movie paired data, all public domain, perfect for your V2M tasks 📽️🎶 Led by the titan Haven Kim, check out the full thread below for more info!

arXiv Sound (@arxivsound) 's Twitter Profile Photo

``Video-Guided Text-to-Music Generation Using Public Domain Movie Collections,'' Haven Kim, Zachary Novack, Weihan Xu, Julian McAuley, Hao-Wen Dong, ift.tt/iOzRbwg

thecollabagepatch (@thepatch_kev) 's Twitter Profile Photo

live coding with stable audio open small? let the vibes begin lol... i love having a bunch endpoints already functioning so this swift app can start becoming whatever tf it wants to be next up, the instruments-only and style transfer endpoints

thecollabagepatch (@thepatch_kev) 's Twitter Profile Photo

stable audio open small is great for stacking multiple generations Zachary Novack lyra bubbles~ ♪❀ the ux speriments continue. changing instrument gen during playback can be pretty jarring tho but methinks style-transfer endpoint may come in handy finetunes might make this glorious fun

Yusong Wu (@wuyusongwys) 's Twitter Profile Photo

It’s been a thrilling journey building FLAM! 🚀 Super proud of what we achieved open‑vocabulary audio event detection using calibrated frame‑wise modeling. FLAM will be presented at ICML 2025, come check it out! 📄 Paper: arxiv.org/abs/2505.05393 🎧 Demo: flam-model.github.io

Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

I always like those paper/author visualizations for other conferences, so I ~vibe coded~ up an interactive one for #ISMIR2025 ISMIR Conference ! Go check it out at: zacharynovack.github.io/ismir2025.html Will hopefully add paper links and other metadata in the coming weeks :)

Hao-Wen (Herman) Dong 董皓文 (@hermanhwdong) 's Twitter Profile Photo

Happy to share that our paper led by Haven Kim has been accepted to #ISMIR2025! 🎉 🎬We presented the OSSL dataset of 736 public domain movie clips (36.5hr in total) with soundtracks. 🔍We explored video-guided text-to-music generation using OSSL. youtu.be/DjBVqhErShM

Morteza Mardani (@mardanimorteza) 's Twitter Profile Photo

📢📢 Elucidated Rolling Diffusion Models (ERDM) How can we stably roll out diffusion models for sequence generation in data-scarce dynamical systems? We elucidate the design of rolling diffusion, inspired by prob. flow ODEs and nonisotropic noise. 📄 arxiv.org/pdf/2506.20024

📢📢 Elucidated Rolling Diffusion Models (ERDM)

How can we stably roll out diffusion models for sequence generation in data-scarce dynamical systems?

We elucidate the design of rolling diffusion, inspired by prob. flow ODEs and nonisotropic noise.

📄 arxiv.org/pdf/2506.20024
Yupeng Hou (@yupenghou97) 's Twitter Profile Photo

Did you know tokenization for generative recommendation today looks a lot like LLM tokenization did *10 years* ago? Meet ActionPiece, our #ICML2025 Spotlight paper, the first context-aware action tokenizer. 1/5 🧵

Did you know tokenization for generative recommendation today looks a lot like LLM tokenization did *10 years* ago?

Meet ActionPiece, our #ICML2025 Spotlight paper, the first context-aware action tokenizer.

1/5 🧵
Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

Stable Audio Open Small is accepted at #WASPAA2025 IEEE WASPAA 2025 ! Can't wait to share the latest in blazingly fast, on-device text-to-audio in Lake Tahoe 🏞️

Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

We're organizing the AI for Music workshop at NeurIPS Conference in San Diego! We'll be accepting both papers + demos w/an initial deadline of August 22, well timed for early visibility on your ICASSP/ICLR drafts 👀 Check out the website for more: aiformusicworkshop.github.io

thecollabagepatch (@thepatch_kev) 's Twitter Profile Photo

made a Hugging Face space for custom sample generation using stable-audio-open-small. already had an api in my backend, so figured i should make a @gradio app for the looping stuff. combine drums+instruments then transform w/melodyflow link 👇

Xun Huang (@xunhuang1995) 's Twitter Profile Photo

We should have called it "scaling up rollout", not RL. RL is a necessary evil for the discrete nature of language. My intuition tells me using RL for continuous data (images, videos, audios), where differentiable supervision is easily available, is a terrible idea.