Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS
December 03, 2022 ยท Declared Dead ยท ๐ Digital Discovery
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Gary Tom, Riley J. Hickman, Aniket Zinzuwadia, Afshan Mohajeri, Benjamin Sanchez-Lengeling, Alan Aspuru-Guzik
arXiv ID
2212.01574
Category
cs.CE: Computational Engineering
Cross-listed
cs.AI
Citations
23
Venue
Digital Discovery
Last Checked
1 month ago
Abstract
Deep learning models that leverage large datasets are often the state of the art for modelling molecular properties. When the datasets are smaller (< 2000 molecules), it is not clear that deep learning approaches are the right modelling tool. In this work we perform an extensive study of the calibration and generalizability of probabilistic machine learning models on small chemical datasets. Using different molecular representations and models, we analyse the quality of their predictions and uncertainties in a variety of tasks (binary, regression) and datasets. We also introduce two simulated experiments that evaluate their performance: (1) Bayesian optimization guided molecular design, (2) inference on out-of-distribution data via ablated cluster splits. We offer practical insights into model and feature choice for modelling small chemical datasets, a common scenario in new chemical experiments. We have packaged our analysis into the DIONYSUS repository, which is open sourced to aid in reproducibility and extension to new datasets.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computational Engineering
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
A Probabilistic Graphical Model Foundation for Enabling Predictive Digital Twins at Scale
R.I.P.
๐ป
Ghosted
Temporal Attention augmented Bilinear Network for Financial Time-Series Data Analysis
R.I.P.
๐ป
Ghosted
Linked Component Analysis from Matrices to High Order Tensors: Applications to Biomedical Data
R.I.P.
๐ป
Ghosted
Deep Dynamical Modeling and Control of Unsteady Fluid Flows
R.I.P.
๐ป
Ghosted
Design and Optimization of Conforming Lattice Structures
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted