Deep learning for audio applications

Written by Stian Aagedal | Jun 7, 2023 9:50:00 AM

Here at HANCE, we focus on the use of deep learning combined with digital signal processing. Our goal is to create solutions based on state-of-the-art academical insight and make them viable in terms of CPU and latency requirements. Some of the use cases of our technology are:

Separation of complete mixes into stems (instrument groups)
Dialogue extraction from background noise
Reduction of unwanted reverberation
Spectrogram inpainting – fill holes in time/frequency
Bandwidth expansion – recover lost content in specific frequency ranges

Source separation

Stem and vocal separation has been a popular topic in academic research during recent years. Stem separation, dialogue extraction and reverb reduction are closely related problems that can be tackled with the same neural network structure, but with different training sets. We will use the term source separation to describe the general problem. Some source separation methods that deliver state-of-the-art results are:

Open-Unmix (spectrogram processing)
Demucs (waveform processing)
MMDenseLSTM (spectrogram processing)
Spleeter (spectrogram processing)

While these methods provide astonishing results, they are not well suited for real-time processing as they introduce too much latency and have demanding memory and CPU (or GPU) requirements. We will focus on spectrogram-based separation as it is generally less computationally expensive than waveform-based separation and the U-Net structure has proven very effective.

Brief overview of current methods

Spectrogram based methods

Spectrogram-based methods for source separation typically estimate a separation mask that is multiplied pointwise with the input magnitude spectrogram to form the separated output magnitude spectrogram. For simplicity, the output audio is commonly reconstructed using the phase information from the input spectrogram. By processing magnitude spectrograms this way, we can easily adopt network topologies that have been implemented for image processing by treating the magnitude spectrogram as a gray-scale image.

View full post