R.I.P.
๐ป
Ghosted
Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN
October 09, 2020 ยท Entered Twilight ยท ๐ Blizzard Challenge / Voice Conversion Challenge
"Last commit was 5.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: LICENSE.txt, README.md, baseline
Authors
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Toda
arXiv ID
2010.04429
Category
cs.SD: Sound
Cross-listed
cs.CL,
eess.AS
Citations
15
Venue
Blizzard Challenge / Voice Conversion Challenge
Repository
https://github.com/bigpon/vcc20_baseline_cyclevae
โญ 131
Last Checked
1 month ago
Abstract
In this paper, we present a description of the baseline system of Voice Conversion Challenge (VCC) 2020 with a cyclic variational autoencoder (CycleVAE) and Parallel WaveGAN (PWG), i.e., CycleVAEPWG. CycleVAE is a nonparallel VAE-based voice conversion that utilizes converted acoustic features to consider cyclically reconstructed spectra during optimization. On the other hand, PWG is a non-autoregressive neural vocoder that is based on a generative adversarial network for a high-quality and fast waveform generator. In practice, the CycleVAEPWG system can be straightforwardly developed with the VCC 2020 dataset using a unified model for both Task 1 (intralingual) and Task 2 (cross-lingual), where our open-source implementation is available at https://github.com/bigpon/vcc20_baseline_cyclevae. The results of VCC 2020 have demonstrated that the CycleVAEPWG baseline achieves the following: 1) a mean opinion score (MOS) of 2.87 in naturalness and a speaker similarity percentage (Sim) of 75.37% for Task 1, and 2) a MOS of 2.56 and a Sim of 56.46% for Task 2, showing an approximately or nearly average score for naturalness and an above average score for speaker similarity.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
R.I.P.
๐ป
Ghosted
CNN Architectures for Large-Scale Audio Classification
R.I.P.
๐ป
Ghosted
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
R.I.P.
๐ป
Ghosted
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R.I.P.
๐ป
Ghosted