FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
December 20, 2023 ยท Entered Twilight ยท ๐ IEEE International Conference on Acoustics, Speech, and Signal Processing
Repo contents: .circleci, .github, .gitignore, .gitmodules, .pre-commit-config.yaml, CODE_OF_CONDUCT.md, CONTRIBUTING.md, LICENSE, MANIFEST.in, README.md, RELEASE.md, docs, examples, fairseq, fairseq_cli, hubconf.py, hydra_plugins, pyproject.toml, release_utils.py, scripts, setup.cfg, setup.py, tests, train.py
Authors
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
arXiv ID
2312.13026
Category
eess.AS: Audio & Speech
Cross-listed
cs.AI,
cs.CL,
cs.SD
Citations
1
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Repository
https://github.com/cs20s030/fusdom
โญ 3
Last Checked
1 month ago
Abstract
Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online. However, continued pre-training on out-of-domain distributions often leads to catastrophic forgetting of previously acquired knowledge, leading to sub-optimal ASR performance. This paper presents FusDom, a simple and novel methodology for SSL-based continued pre-training. FusDom learns speech representations that are robust and adaptive yet not forgetful of concepts seen in the past. Instead of solving the SSL pre-text task on the output representations of a single model, FusDom leverages two identical pre-trained SSL models, a teacher and a student, with a modified pre-training head to solve the CP SSL pre-text task. This head employs a cross-attention mechanism between the representations of both models while only the student receives gradient updates and the teacher does not. Finally, the student is fine-tuned for ASR. In practice, FusDom outperforms all our baselines across settings significantly, with WER improvements in the range of 0.2 WER - 7.3 WER in the target domain while retaining the performance in the earlier domain.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Audio & Speech
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
R.I.P.
๐ป
Ghosted
DiffWave: A Versatile Diffusion Model for Audio Synthesis
R.I.P.
๐ป
Ghosted
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
R.I.P.
๐ป
Ghosted
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
R.I.P.
๐ป
Ghosted