Awesome
SDEdit-AudioLDM2
This is the SDEdit implementation for AudioLDM2 use in AP-adapter "Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning" in Proc. Int. Society for Music Information Retrieval Conf. (ISMIR), 2024..
Installation
git clone https://github.com/fundwotsai2001/SDEdit-AudioLDM2.git
pip install -r requirements.txt
Inference
You can edit the config.py file. Most importantly, if noise_scale = 1.0, it means full steps of noise will be added. If noise_scale = 0, it means no noise will be added. You can tune this parameter based on your preference.
python inference.py
Key change from original AudioLDM2
Basically we add a few lines to encode the mel and add desired degree of noise. All other parts remains the same.
# Get mel from the input audio
mel = wav_to_mel(audio_path,
10,
augment_data=False,
mix_data=None,
snr=None)
mel = mel.unsqueeze(0).to(device).to(torch.float16)
# Decode mel
latents = self.vae.encode(mel).latent_dist.sample()
latents = latents * self.vae.config.scaling_factor
noise = torch.randn_like(latents)
# Decide the noise level
shallow_reverse_step = int(num_inference_steps * (1 - noise_scale))
timesteps = timesteps[shallow_reverse_step:]
timesteps_tensor = torch.tensor([timesteps[0]], dtype=torch.int32)
# Add the coresponding noise to the latent
noisy_sample = self.scheduler.add_noise(latents,noise,timesteps_tensor)