DOVER: A Method for Combining Diarization Outputs
September 17, 2019 ยท Entered Twilight ยท ๐ Automatic Speech Recognition & Understanding
"Last commit was 5.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: LICENSE, README.md, doc, example1, example2, scripts
Authors
Andreas Stolcke, Takuya Yoshioka
arXiv ID
1909.08090
Category
cs.CL: Computation & Language
Citations
26
Venue
Automatic Speech Recognition & Understanding
Repository
https://github.com/stolcke/dover
โญ 11
Last Checked
1 month ago
Abstract
Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems. Diarization, the task of segmenting an audio stream into speaker-homogeneous and co-indexed regions, has so far not seen the benefit of this strategy because the structure of the task does not lend itself to a simple voting approach. This paper presents DOVER (diarization output voting error reduction), an algorithm for weighted voting among diarization hypotheses, in the spirit of the ROVER algorithm for combining speech recognition hypotheses. We evaluate the algorithm for diarization of meeting recordings with multiple microphones, and find that it consistently reduces diarization error rate over the average of results from individual channels, and often improves on the single best channel chosen by an oracle.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted