Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

October 29, 2018 · Declared Dead · 🏛 International Conference on Learning Representations

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck arXiv ID 1810.12247 Category cs.SD: Sound Cross-listed cs.LG, eess.AS, stat.ML Citations 536 Venue International Conference on Learning Representations Last Checked 1 month ago

Abstract

Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Sound

R.I.P. 👻 Ghosted

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, Sander Dieleman, ... (+7 more)

cs.SD 🏛 Speech Synthesis 📚 8.0K cites 9 years ago

R.I.P. 👻 Ghosted

CNN Architectures for Large-Scale Audio Classification

Shawn Hershey, Sourish Chaudhuri, ... (+11 more)

cs.SD 🏛 ICASSP 📚 2.8K cites 9 years ago

R.I.P. 👻 Ghosted

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Yi Luo, Nima Mesgarani

cs.SD 🏛 IEEE/ACM TASLP 📚 2.1K cites 7 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

Justin Salamon, Juan Pablo Bello

cs.SD 🏛 IEEE SPL 📚 1.4K cites 9 years ago

R.I.P. 👻 Ghosted

WaveGlow: A Flow-based Generative Network for Speech Synthesis

Ryan Prenger, Rafael Valle, Bryan Catanzaro

cs.SD 🏛 ICASSP 📚 1.1K cites 7 years ago

R.I.P. 👻 Ghosted

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks

Morten Kolbæk, Dong Yu, ... (+2 more)

cs.SD 🏛 IEEE/ACM TASLP 📚 763 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago