Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS

December 03, 2022 · Declared Dead · 🏛 Digital Discovery

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Gary Tom, Riley J. Hickman, Aniket Zinzuwadia, Afshan Mohajeri, Benjamin Sanchez-Lengeling, Alan Aspuru-Guzik arXiv ID 2212.01574 Category cs.CE: Computational Engineering Cross-listed cs.AI Citations 23 Venue Digital Discovery Last Checked 1 month ago

Abstract

Deep learning models that leverage large datasets are often the state of the art for modelling molecular properties. When the datasets are smaller (< 2000 molecules), it is not clear that deep learning approaches are the right modelling tool. In this work we perform an extensive study of the calibration and generalizability of probabilistic machine learning models on small chemical datasets. Using different molecular representations and models, we analyse the quality of their predictions and uncertainties in a variety of tasks (binary, regression) and datasets. We also introduce two simulated experiments that evaluate their performance: (1) Bayesian optimization guided molecular design, (2) inference on out-of-distribution data via ablated cluster splits. We offer practical insights into model and feature choice for modelling small chemical datasets, a common scenario in new chemical experiments. We have packaged our analysis into the DIONYSUS repository, which is open sourced to aid in reproducibility and extension to new datasets.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computational Engineering

R.I.P. 👻 Ghosted

Temporal Relational Ranking for Stock Prediction

Fuli Feng, Xiangnan He, ... (+4 more)

cs.CE 🏛 ACM TOIS 📚 485 cites 7 years ago

R.I.P. 👻 Ghosted

A Probabilistic Graphical Model Foundation for Enabling Predictive Digital Twins at Scale

Michael G. Kapteyn, Jacob V. R. Pretorius, Karen E. Willcox

cs.CE 🏛 Nature Computational Science 📚 277 cites 5 years ago

R.I.P. 👻 Ghosted

Temporal Attention augmented Bilinear Network for Financial Time-Series Data Analysis

Dat Thanh Tran, Alexandros Iosifidis, ... (+2 more)

cs.CE 🏛 IEEE TNNLS 📚 222 cites 8 years ago

R.I.P. 👻 Ghosted

Linked Component Analysis from Matrices to High Order Tensors: Applications to Biomedical Data

Guoxu Zhou, Qibin Zhao, ... (+4 more)

cs.CE 🏛 Proc. IEEE 📚 190 cites 10 years ago

R.I.P. 👻 Ghosted

Deep Dynamical Modeling and Control of Unsteady Fluid Flows

Jeremy Morton, Freddie D. Witherden, ... (+2 more)

cs.CE 🏛 NeurIPS 📚 181 cites 7 years ago

R.I.P. 👻 Ghosted

Design and Optimization of Conforming Lattice Structures

Jun Wu, Weiming Wang, Xifeng Gao

cs.CE 🏛 IEEE TVCG 📚 158 cites 6 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago