Stability AI Unveils Stable Audio 3: Faster, Open-Source AI Audio Generation

AI Models28.May.2026 13:432 min read

Stability AI has released Stable Audio 3, a next-generation latent diffusion model for high-quality stereo audio generation. Featuring a novel 4096x compression autoencoder and optimized architecture, it delivers variable-length audio generation at unprecedented speeds while open-sourcing smaller and medium model weights.

Stability AI Unveils Stable Audio 3: Faster, Open-Source AI Audio Generation

Stability AI Launches Stable Audio 3

Stability AI has officially released Stable Audio 3, a next-generation latent diffusion model designed for high-fidelity audio generation and editing. Alongside the launch, the company has open-sourced the weights for its smaller and medium-sized variants, marking a significant step forward in accessible AI audio creation.

Architectural Breakthroughs and Compression

At the core of Stable Audio 3 is a novel architecture built around two primary components: the Semantic Acoustic Model Encoder (SAME) and a highly optimized diffusion transformer. The SAME autoencoder achieves a remarkable 4096x audio compression ratio, drastically reducing the length of latent sequences. This efficiency allows the model to run complex, long-form audio generation tasks smoothly on standard consumer-grade hardware, effectively lowering the barrier to entry for independent creators and producers.

Unprecedented Generation Speed

Stable Audio 3 introduces dynamic computational scaling based on audio duration, eliminating the inefficiencies of fixed-length generation. In benchmark tests on high-performance hardware, the model renders 20 seconds of audio in just 0.62 seconds and can generate a full 380-second track in approximately 1.31 seconds. By implementing a three-stage training pipeline, the model bypasses traditional classifier-free guidance during inference, relying instead on a streamlined single-step forward pass for rapid output.

Variable Length and Advanced Editing

Beyond raw speed, the update brings robust support for variable-length audio generation and introduces inpainting-based editing capabilities. Creators can now seamlessly modify specific segments of an existing track or generate audio of arbitrary durations without compromising structural coherence or audio quality. The model supports high-quality stereo output, making it suitable for both music composition and professional sound design workflows.

Availability and Licensing

The open-source weights for the small and medium configurations are currently available on Hugging Face, enabling developers and researchers to experiment and integrate the technology into their own pipelines. A larger, more capable version of the model will be offered through commercial licensing, catering to enterprise and professional studio environments.