Diff-Pitcher

Diffusion-based Singing Voice Pitch Correction

Framework

Framework

😊Diff-Pitcher is available!

Template-based Automatic Pitch Correction Examples

Here are examples of template-based automatic pitch correction where the pitch curve of a template audio is transferred to the out-of-tune audio. A Dynamic Time Warping algorithm based on MCEP is applied to further align template pitch curve and the target audio.

*The degree of off-key singing is notably higher in Chinese examples. Furthermore, these Chinese examples are recorded by mobile devices, mirroring the typical usage scenario of a karaoke app.

*The paired English recordings are from PopBuTFy, which focuses more on fine-grained vocal techniques such as Vibrato rather than extreme off-key.

Sample 1 (CH/Female)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Sample 2 (CH/Female)

Out-of-tune audio Template audio

Diff-Pitcher-WORLD Diff-Pitcher-LPC SiFi-GAN WORLD Vocoder

Sample 3 (CH/Female)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Sample 4 (CH/Male)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Sample 5 (CH/Male)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Sample 6 (EN/Female)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Sample 7 (EN/Female)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Sample 8 (EN/Female)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Sample 9 (EN/Male)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Sample 10 (EN/Male)

Out-of-tune audio Template audio

DiffPitcher-WORLD DiffPitcher-LPC SiFi-GAN WORLD Vocoder

Score-based Automatic Pitch Correction Examples

Here are examples of score-based automatic pitch correction where the pitch curve predicted by the pitch predictor is transferred to the out-of-tune audio. The pitch predictor takes MIDI notes and the vocal spectrum of the out-of-tune audio as inputs.

*Compared with Templated-based APC, score-based APC offers great flexibility and diversity at the cost of a small degradation of naturalness.

*We are collecting English acapella examples.

Sample 1 (CH/Female)


Out-of-tune audio MIDI Notes Tuned Audio

Sample 2 (CH/Female)


Out-of-tune audio MIDI Notes Tuned Audio

Sample 3 (CH/Female)


Out-of-tune audio MIDI Notes Tuned Audio

Sample 4 (CH/Female)


Out-of-tune audio MIDI Notes Tuned Audio

Sample 5 (CH/Male)


Out-of-tune audio MIDI Notes Tuned Audio

Sample 6 (CH/Male)


Out-of-tune audio MIDI Notes Tuned Audio

Appendix

Objective Experiment:

Approach -6 Semitones -3 Semitones Reconstruction +3 Semitones +6 Semitones
WORLD 0.03 0.03 0.03 0.03 0.03
SiFi-GAN 0.05 0.04 0.04 0.05 0.04
DiffPitcher-WORLD 0.04 0.03 0.04 0.04 0.06

Approach -6 Semitones -3 Semitones Reconstruction +3 Semitones +6 Semitones
WORLD 2.00% 2.68% 3.23% 3.04% 2.17%
SiFi-GAN 2.98% 2.85% 3.35% 2.98% 3.61%
DiffPitcher-WORLD 2.57% 2.41% 2.88% 2.64% 2.23%