Solos: A Dataset for Audio-Visual Music Analysis
June 14, 2020 ยท Entered Twilight ยท ๐ IEEE International Workshop on Multimedia Signal Processing
"No code URL or promise found in abstract"
"Derived repo from GitHub Pages (backfill)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, MANIFEST.in, README.md, Solos, dev, requirements.txt, setup.py, setupf.cfg, test, youtubesaver.py
Authors
Juan F. Montesinos, Olga Slizovskaia, Gloria Haro
arXiv ID
2006.07931
Category
eess.AS: Audio & Speech
Cross-listed
cs.DB,
cs.SD
Citations
15
Venue
IEEE International Workshop on Multimedia Signal Processing
Repository
https://github.com/juanfmontesinos/Solos
โญ 24
Last Checked
1 month ago
Abstract
In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual self-supervised task. These videos, gathered from YouTube, consist of solo musical performances of 13 different instruments. Compared to previously proposed audio-visual datasets, Solos is cleaner since a big amount of its recordings are auditions and manually checked recordings, ensuring there is no background noise nor effects added in the video post-processing. Besides, it is, up to the best of our knowledge, the only dataset that contains the whole set of instruments present in the URMP\cite{URPM} dataset, a high-quality dataset of 44 audio-visual recordings of multi-instrument classical music pieces with individual audio tracks. URMP was intented to be used for source separation, thus, we evaluate the performance on the URMP dataset of two different source-separation models trained on Solos. The dataset is publicly available at https://juanfmontesinos.github.io/Solos/
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Audio & Speech
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
R.I.P.
๐ป
Ghosted
DiffWave: A Versatile Diffusion Model for Audio Synthesis
R.I.P.
๐ป
Ghosted
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
R.I.P.
๐ป
Ghosted
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
R.I.P.
๐ป
Ghosted