DOVER: A Method for Combining Diarization Outputs

September 17, 2019 · Entered Twilight · 🏛 Automatic Speech Recognition & Understanding

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: LICENSE, README.md, doc, example1, example2, scripts

Authors Andreas Stolcke, Takuya Yoshioka arXiv ID 1909.08090 Category cs.CL: Computation & Language Citations 26 Venue Automatic Speech Recognition & Understanding Repository https://github.com/stolcke/dover ⭐ 11 Last Checked 1 month ago

Abstract

Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems. Diarization, the task of segmenting an audio stream into speaker-homogeneous and co-indexed regions, has so far not seen the benefit of this strategy because the structure of the task does not lend itself to a simple voting approach. This paper presents DOVER (diarization output voting error reduction), an algorithm for weighted voting among diarization hypotheses, in the spirit of the ROVER algorithm for combining speech recognition hypotheses. We evaluate the algorithm for diarization of meeting recordings with multiple microphones, and find that it consistently reduces diarization error rate over the average of results from individual channels, and often improves on the single best channel chosen by an oracle.