Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
July 27, 2022 ยท The Cartographer ยท ๐ 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)
"No code URL or promise found in abstract"
"Title-pattern auto-detect: Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks"
Evidence collected by the PWNC Scanner
Authors
Tilman Rรคuker, Anson Ho, Stephen Casper, Dylan Hadfield-Menell
arXiv ID
2207.13243
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
cs.CL,
cs.CV
Citations
174
Venue
2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)
Last Checked
7 days ago
Abstract
The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed in the real world. However, they are difficult to analyze, raising concerns about using them without a rigorous understanding of how they function. Effective tools for interpreting them will be important for building more trustworthy AI by helping to identify problems, fix bugs, and improve basic understanding. In particular, "inner" interpretability techniques, which focus on explaining the internal components of DNNs, are well-suited for developing a mechanistic understanding, guiding manual modifications, and reverse engineering solutions. Much recent work has focused on DNN interpretability, and rapid progress has thus far made a thorough systematization of methods difficult. In this survey, we review over 300 works with a focus on inner interpretability tools. We introduce a taxonomy that classifies methods by what part of the network they help to explain (weights, neurons, subnetworks, or latent representations) and whether they are implemented during (intrinsic) or after (post hoc) training. To our knowledge, we are also the first to survey a number of connections between interpretability research and work in adversarial robustness, continual learning, modularity, network compression, and studying the human visual system. We discuss key challenges and argue that the status quo in interpretability research is largely unproductive. Finally, we highlight the importance of future work that emphasizes diagnostics, debugging, adversaries, and benchmarking in order to make interpretability tools more useful to engineers in practical applications.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal