LSTM: A Search Space Odyssey
March 13, 2015 Β· Declared Dead Β· π IEEE Transactions on Neural Networks and Learning Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Klaus Greff, Rupesh Kumar Srivastava, Jan KoutnΓk, Bas R. Steunebrink, JΓΌrgen Schmidhuber
arXiv ID
1503.04069
Category
cs.NE: Neural & Evolutionary
Cross-listed
cs.LG
Citations
6.0K
Venue
IEEE Transactions on Neural Networks and Learning Systems
Last Checked
1 month ago
Abstract
Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs ($\approx 15$ years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Neural & Evolutionary
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Progressive Growing of GANs for Improved Quality, Stability, and Variation
R.I.P.
π»
Ghosted
Learning both Weights and Connections for Efficient Neural Networks
R.I.P.
π»
Ghosted
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
R.I.P.
π»
Ghosted
An Introduction to Convolutional Neural Networks
R.I.P.
π»
Ghosted
Deep Learning using Rectified Linear Units (ReLU)
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted