
Stable Audio Open is an open-source text-to-audio model developed by Stability AI. It generates sound recordings based on text descriptions by using a combination of a variational autoencoder, a conditioning signal, and a diffusion model. The process begins with a text description, such as "Rock beat played in a treated studio, session drumming on an acoustic kit1." This description is used to guide the generation of the audio.
The text description is fed into the model, which uses it to generate a conditioning signal. This signal is then used to guide the diffusion model, which generates the audio data. The diffusion model works by iteratively refining a set of random noise vectors until they match the desired audio data distribution.
The generated audio data is then passed through a variational autoencoder, which compresses the audio data into a latent space. This compressed representation is then used to generate the final audio output. The autoencoder is trained to preserve important features of the audio while removing unnecessary noise, which makes the system faster and more efficient.
The model was trained on a large dataset of over 800,000 audio files, which included music, sound effects, and single-instrument recordings. The training data was accompanied by text metadata describing the audio file, which was used to generate text prompts for training the model.
Stable Audio Open can generate a wide range of audio, including drum beats, instrument riffs, ambient sounds, and production elements for videos, films, and TV shows6. It can also be used to edit existing songs or apply the style of one song to another. However, it's not designed for generating full songs, melodies, or vocals.

Stable Audio Open is an open-source AI model that can generate various types of audio samples, sound effects, and production elements from text descriptions3. It is specifically designed for sound designers, musicians, and creative professionals. Some specific types of audio it can generate include:
Potential applications for Stable Audio Open include music production, sound design for films and TV shows, video game development, podcasting, and virtual reality experiences. The model allows users to fine-tune the generated audio samples and customize them with their own audio data, making it a versatile tool for various creative projects.

Stability AI's newly released model, Stable Audio Open, primarily functions as a generative AI model for creating sounds and songs. It can generate audio based on a text description provided by the user, such as "Rock beat played in a treated studio, session drumming on an acoustic kit." The model was trained on approximately 486,000 royalty-free samples from FreeSound and the Free Music Archive and can produce audio recordings up to 47 seconds in length. It can be used to create drum beats, instrument riffs, ambient noises, and production elements for various multimedia projects. However, it is not designed to create full songs, melodies, or vocals, and its commercial use is prohibited by its terms of service.