Elements of effective machine learning datasets in astronomy

November 25, 2022 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Bernie Boscoe, Tuan Do, Evan Jones, Yunqi Li, Kevin Alfaro, Christy Ma arXiv ID 2211.14401 Category astro-ph.IM Cross-listed cs.LG Citations 3 Venue arXiv.org Last Checked 1 month ago

Abstract

In this work, we identify elements of effective machine learning datasets in astronomy and present suggestions for their design and creation. Machine learning has become an increasingly important tool for analyzing and understanding the large-scale flood of data in astronomy. To take advantage of these tools, datasets are required for training and testing. However, building machine learning datasets for astronomy can be challenging. Astronomical data is collected from instruments built to explore science questions in a traditional fashion rather than to conduct machine learning. Thus, it is often the case that raw data, or even downstream processed data is not in a form amenable to machine learning. We explore the construction of machine learning datasets and we ask: what elements define effective machine learning datasets? We define effective machine learning datasets in astronomy to be formed with well-defined data points, structure, and metadata. We discuss why these elements are important for astronomical applications and ways to put them in practice. We posit that these qualities not only make the data suitable for machine learning, they also help to foster usable, reusable, and replicable science practices.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — astro-ph.IM

R.I.P. 👻 Ghosted

Rotation-invariant convolutional neural networks for galaxy morphology prediction

Sander Dieleman, Kyle W. Willett, Joni Dambre

astro-ph.IM 🏛 arXiv 📚 688 cites 11 years ago

R.I.P. 👻 Ghosted

Deep Neural Networks to Enable Real-time Multimessenger Astrophysics

Daniel George, E. A. Huerta

astro-ph.IM 🏛 arXiv 📚 213 cites 9 years ago

🌅 🌅 Old Age

Star-galaxy Classification Using Deep Convolutional Neural Networks

Edward J. Kim, Robert J. Brunner

astro-ph.IM 🏛 arXiv 📚 170 cites 9 years ago

R.I.P. 👻 Ghosted

CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks

Mustafa Mustafa, Deborah Bard, ... (+4 more)

astro-ph.IM 🏛 Computational Astrophysics and Cosmology 📚 126 cites 8 years ago

R.I.P. 👻 Ghosted

Non-negative Matrix Factorization: Robust Extraction of Extended Structures

Bīn Rén, Laurent Pueyo, ... (+3 more)

astro-ph.IM 🏛 arXiv 📚 109 cites 8 years ago

R.I.P. 💀 404 Not Found

Deep Recurrent Neural Networks for Supernovae Classification

Tom Charnock, Adam Moss

astro-ph.IM 🏛 arXiv 📚 96 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago