Elements of effective machine learning datasets in astronomy
November 25, 2022 ยท Declared Dead ยท ๐ arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Bernie Boscoe, Tuan Do, Evan Jones, Yunqi Li, Kevin Alfaro, Christy Ma
arXiv ID
2211.14401
Category
astro-ph.IM
Cross-listed
cs.LG
Citations
3
Venue
arXiv.org
Last Checked
1 month ago
Abstract
In this work, we identify elements of effective machine learning datasets in astronomy and present suggestions for their design and creation. Machine learning has become an increasingly important tool for analyzing and understanding the large-scale flood of data in astronomy. To take advantage of these tools, datasets are required for training and testing. However, building machine learning datasets for astronomy can be challenging. Astronomical data is collected from instruments built to explore science questions in a traditional fashion rather than to conduct machine learning. Thus, it is often the case that raw data, or even downstream processed data is not in a form amenable to machine learning. We explore the construction of machine learning datasets and we ask: what elements define effective machine learning datasets? We define effective machine learning datasets in astronomy to be formed with well-defined data points, structure, and metadata. We discuss why these elements are important for astronomical applications and ways to put them in practice. We posit that these qualities not only make the data suitable for machine learning, they also help to foster usable, reusable, and replicable science practices.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ astro-ph.IM
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Deep Neural Networks to Enable Real-time Multimessenger Astrophysics
๐
๐
Old Age
Star-galaxy Classification Using Deep Convolutional Neural Networks
R.I.P.
๐ป
Ghosted
CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks
R.I.P.
๐ป
Ghosted
Non-negative Matrix Factorization: Robust Extraction of Extended Structures
R.I.P.
๐
404 Not Found
Deep Recurrent Neural Networks for Supernovae Classification
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted