DPM-TSE

A DIFFUSION PROBABILISTIC MODEL FOR TARGET SOUND EXTRACTION

Jiarui HaiπŸš€, Helin WangπŸš€, Dongchao Yang, Karan Thakkar, Najim Dehak, Mounya Elhilali

😊Repository

πŸ“„PDF

Abstract

Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background. This study introduces DPM-TSE, a first generative method based on diffusion probabilistic modeling (DPM) for target sound extraction, to achieve both cleaner target renderings as well as improved separability from unwanted sounds. The technique also tackles common background noise issues with DPM by introducing a correction method for noise schedules and sample steps. This approach is evaluated using both objective and subjective quality metrics on the FSD Kaggle 2018 dataset. The results show that DPM-TSE has a significant improvement in perceived quality in terms of target extraction and purity.

Model Framework

Image Description

Examples

Mixture Target Sound (GT) Label DPM-TSE (Ours) TSENET WaveFormer
Applause
Bark
Harmonica
Cat_Meow
Shatter
Snare_Drum
Squeak
Writing

Ablation Study

Image Description
Mixture Target Sound (GT) Label Default DPM Corrected DPM
Tambourine (Instrument)
Image Description
Mixture Target Sound (GT) Label Default DPM Corrected DPM
Finger_Snapping