camenduru (@camenduru) Twitter Tweets • TwiCopy

Toby Kim

4 months ago

Two undergrads. One still in the military. Zero funding. One ridiculous goal: build a TTS model that rivals NotebookLM Podcast, ElevenLabs Studio, and Sesame CSM. Somehow… we pulled it off. Here’s how 👇

thumb_up_off_alt5,5K

chat_bubble_outline225

repeat607

shareShare

camenduru

@camenduru

4 months ago

🎧 Dia is a 1.6B parameter text to speech model created by Nari Labs. (Apache 2.0) 🔊 Jupyter Notebook 🥳 Thanks to Toby Kim ❤ Nari Labs ❤ 🌐page: tally.so/r/meokbo 🧬code: github.com/nari-labs/dia 🍊jupyte: github.com/camenduru/dia-…

thumb_up_off_alt161

chat_bubble_outline3

repeat24

shareShare

Ostris

@ostrisai

4 months ago

Flex.2-preview is here with text to image, universal control (line, pose, depth), and inpainting all baked into one model. Fine tunable with AI-Toolkit, Apache 2.0 license, 8B parameters. Link in 🧵

thumb_up_off_alt613

chat_bubble_outline25

repeat87

shareShare

Yin Cui

@yincuicv

4 months ago

Introducing the Describe Anything Model (DAM), a powerful Multimodal LLM that generates detailed descriptions for user-specified regions in images or videos using points, boxes, scribbles, or masks. Open-source code, models, demo, data, and benchmark at: describe-anything.github.io

thumb_up_off_alt401

chat_bubble_outline6

repeat75

shareShare

PyTorch

@pytorch

4 months ago

Update from the PyTorch maintainers: 2.7 is out now. 🔹 Support for NVIDIA Blackwell (CUDA 12.8) 🔹 Mega Cache 🔹 torch.compile for Function Modes 🔹 FlexAttention updates 🔹 Intel GPU perf boost 🔗 Blog: hubs.la/Q03jBPSL0 📄 Release notes: hubs.la/Q03jBPlW0 #PyTorch

thumb_up_off_alt513

chat_bubble_outline13

repeat89

shareShare

ACE Studio

@acestudio_en

4 months ago

We’re excited to release ACE-Step / ACE-Step-v1-3.5B, a fast, versatile DiT-based foundation model for music generation that runs on consumer-grade GPUs. With its simple architecture and low hardware requirements, it’s easy to fine-tune for various music tasks, empowering, not

thumb_up_off_alt171

chat_bubble_outline27

repeat55

shareShare

Lightricks

@lightricks

4 months ago

Meet LTX-Video 13B, our latest and most capable open-source video generation model. - 13B parameters - Multiscale rendering for sharper detail - Smarter motion + scene understanding - Keyframes, character + camera movement, multi-shot support - Still fast – and runs locally

thumb_up_off_alt201

chat_bubble_outline6

repeat29

shareShare

camenduru

@camenduru

4 months ago

🎸 🎙 🎵 ACE-Step: A Step Towards Music Generation Foundation Model 📻 🎶 Jupyter Notebook 🥳 Thanks to Gong Junmin ❤ Sean Zhao ❤ Sen Wang ❤ Shengyuan Xu ❤ Joe Guo ❤ 🌐page: ace-step.github.io 🧬code: github.com/ace-step/ACE-S… 🍊jupyter: github.com/camenduru/ACE-…

thumb_up_off_alt39

chat_bubble_outline1

repeat0

shareShare

Hunyuan

@tencenthunyuan

4 months ago

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥 Highlights: ✅Subject Consistency: Maintains identity

thumb_up_off_alt638

chat_bubble_outline22

repeat149

shareShare

Hunyuan

@tencenthunyuan

4 months ago

🚀You can use HunyuanCustom on ComfyUI. Special thanks to Kijai Jukka Seppänen again! HunyuanCustom has been integrated into [ComfyUI-HunyuanVideoWrapper](github.com/kijai/ComfyUI-…) by [Kijai](github.com/kijai). To use it, you need to: 1️⃣ Download the `fp8_scaled` model

thumb_up_off_alt269

chat_bubble_outline10

repeat50

shareShare

Wan

@alibaba_wan

4 months ago

✨ All in One, Wan for All✨ We are excited to introduce our latest model to our talented community creators: Wan2.1-VACE, All-in-One Video Creation and Editing model. Model size: 1.3B, 14B License: Apache-2.0 📌 Wan2.1-VACE provides solutions for various tasks, including

thumb_up_off_alt1,1K

chat_bubble_outline46

repeat249

shareShare

Jake Steinerman

@jasteinerman

4 months ago

We built an entire VR game....and open sourced the entire thing. Introducing "North Star" - play it today on Quest, and download the entire project on Github!

thumb_up_off_alt791

chat_bubble_outline24

repeat96

shareShare

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

4 months ago

Releasing Stable Audio Open Small! 75ms GPU latency! 7s *mobile* CPU latency! How? w/Adversarial Relativistic Contrastive (ARC) Post-Training! 📘:arxiv.org/abs/2505.08175 🥁:arc-text2audio.github.io/web/ 🤗:huggingface.co/stabilityai/st… Here’s how we made the fastest TTA out there🧵

thumb_up_off_alt84

chat_bubble_outline2

repeat14

shareShare

Baku

@bk_sakurai

4 months ago

*動画生成：Sunoで作ったオリジナル曲をComfyUI-FLOATで歌ってもらう #comfyui note投稿しました。 note.com/bakushu/n/n1f8…

thumb_up_off_alt2,2K

chat_bubble_outline16

repeat313

shareShare

Bin Lin

@linbin46984

3 months ago

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation. 🌟Now data, model, training & evaluation script are open-source! github.com/PKU-YuanGroup/…

thumb_up_off_alt190

chat_bubble_outline4

repeat33

shareShare

Chenyang Si

@scy994

3 months ago

⚡️DCM: High-Quality Video Generation Accelerator⚡️ 🚀DCM brings 10× faster inference to video diffusion models! 🚀From 1500s → 120s on models like HunyuanVideo13B. -Paper: huggingface.co/papers/2506.03… -Code: github.com/Vchitect/DCM -Project: vchitect.github.io/DCM/

thumb_up_off_alt147

chat_bubble_outline1

repeat37

shareShare

Julius Erbach

@juliuserbach

3 months ago

🚀 Just released: FLAIR – a new training-free approach to solving inverse problems using flow-matching models! 🎯 Try it live: huggingface.co/spaces/prs-eth… 📚 Learn more: inverseflair.github.io

thumb_up_off_alt96

chat_bubble_outline3

repeat16

shareShare

Matthias Niessner

@mattniessner

3 months ago

📢Code Release of Pixel3DMM 📢 Looking for a robust and accurate face tracker? Our state-of-the-art tracker handles challenging in-the-wild settings, such as extreme lighting conditions, fast movements, and occlusions. 👨‍💻github.com/SimonGiebenhai… 🌍simongiebenhain.github.io/pixel3dmm/

thumb_up_off_alt642

chat_bubble_outline10

repeat97

shareShare

Hao Zhang

@haozhang623

2 months ago

🚨 New paper accepted to #ICCV2025! We introduce PhysRig – a differentiable physics-based rigging framework that brings realistic dynamics to characters 🔩🦖 💥 Soft tissues, tails, ears… now move like real flesh, not rigid plastic. #AI #Graphics #Animation #ComputerVision 👇

thumb_up_off_alt62

chat_bubble_outline5

repeat15

shareShare