DDSP: Differentiable Digital Signal Processing
Overview
Differentiable Digital Signal Procressing (DDSP) enables direct integration of classic signal processing elements with end-to-end learning, utilizing strong inductive biases without sacrificing the expressive power of neural networks. This approach enables high-fidelity audio synthesis without the need for large autoregressive models or adversarial losses, and permits interpretable manipulation of each separate model component. In all figures below, linear-frequency log-magnitude spectrograms are used to visualize the audio, which is synthesized with a sample rate of 16kHz.
https://gyazo.com/9fe07cdc21e632bf9401fa6764e85875
Links:
Paper:
Sound Samples:
What you can do:
- Timber Transfer (cello -> violin, vocal -> violin)
- Timber Interpolation
- Accoustic environment transfer (copy the reverbration of Suntory Hall to my room ambience)
- De-reverbration
- Independent Control of Loudness, Pitch, and Timbre
https://gyazo.com/67b359dfa9b2e5dab7f833a4dd5ffc2b
Overall Audio Diagram
https://gyazo.com/f72231cc5092732e4273cefe4839ce2d
Model Size
https://gyazo.com/b161858ca3a26afb4f8d00caa7dac735
Multi-scale spectrogram loss
$ L_{i}=\left\|S_{i}-\hat{S}_{i}\right\|_{1}+\alpha\left\|\log S_{i}-\log \hat{S}_{i}\right\|_{1}
$ S_i, \hat{S}_i Magnitude Spectrogram with FFT Size$ i
$ i=(2048, 1024, 512, 256, 128, 64) FFT Size
$ \alpha weight coefficient
----------
Simpler PyTorch Implementations
https://www.youtube.com/watch?v=U2ZXANU9EQg