Melodiff MusicLDM v2

This is next version after Melodiff Riffusion v1

Melodiff MusicLDM continues to explore the idea of using the audio to audio pipeline of Stable Difussion audio models for creating cover versions of songs.

Melodiff MusicLDM uses MusicLDM model as base model for audio generation.

What was done and what is presented here: Deconstructing the base pipeline and reconstructing back for audio to audio modifications.

No new model training, nor finetuning was done, only modifications to base pipeline.

MusicLDM generates audio of better quality compared to Riffusion (first) model. It generates samples of length 10s compared to 5s samples of previous model.

Also speed of generation improved: previously it took about 8s to generate 5s long sample of mono audio. Now it takes about 8s to generate 10s long sample of stereo audio.

Also consistency. Previosly only about 30% of modified samples were good (or ok) and some prompt and seed play was needed to find good sound quality.

Now about 70% of modified samples are good (or ok).

Again longer modifications are possible by splitting, modifying and concatenating back the samples.

Underlying MusicLDM model is two years old. It would be interesting to try new models, which have notably better quality.

Examples of music generated by modifying the underlying song:

Bella Ciao, originally played by saxophone, modified to be played by electric guitar

Bella Ciao, originally played by violin, modified to be played by piano

Iko iko, originally played by saxophone, modified to be played by violin

When the Saints, originally played by saxophone, modified to be played by strings

Examples of original with modified samples:

Saxophone solo, original

Modified to be played by violin

Modified to be played by electric guitar