Similarity Analysis of Self-Supervised Speech Representations
October 22, 2020 ยท Entered Twilight ยท ๐ IEEE International Conference on Acoustics, Speech, and Signal Processing
"Last commit was 5.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: Figs, README.md
Authors
Yu-An Chung, Yonatan Belinkov, James Glass
arXiv ID
2010.11481
Category
eess.AS: Audio & Speech
Cross-listed
cs.CL,
cs.LG,
cs.SD
Citations
44
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Repository
https://github.com/iamyuanchung/ICASSP21-Similarity-Supplementary
โญ 2
Last Checked
1 month ago
Abstract
Self-supervised speech representation learning has recently been a prosperous research topic. Many algorithms have been proposed for learning useful representations from large-scale unlabeled data, and their applications to a wide range of speech tasks have also been investigated. However, there has been little research focusing on understanding the properties of existing approaches. In this work, we aim to provide a comparative study of some of the most representative self-supervised algorithms. Specifically, we quantify the similarities between different self-supervised representations using existing similarity measures. We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations. In addition to showing how various self-supervised models behave differently given the same input, our study also finds that the training objective has a higher impact on representation similarity than architectural choices such as building blocks (RNN/Transformer/CNN) and directionality (uni/bidirectional). Our results also suggest that there exists a strong correlation between pre-training loss and downstream performance for some self-supervised algorithms.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Audio & Speech
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
R.I.P.
๐ป
Ghosted
DiffWave: A Versatile Diffusion Model for Audio Synthesis
R.I.P.
๐ป
Ghosted
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
R.I.P.
๐ป
Ghosted
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
R.I.P.
๐ป
Ghosted