R.I.P.
๐ป
Ghosted
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
May 04, 2023 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: README.md, incontext_3.4.ipynb, mnist_3.5.ipynb, modular_addition_3.3.ipynb, permutation_S4_3.3.ipynb, requirements.txt, symbolic_formulas_3.1.ipynb, two_moon_3.2.ipynb
Authors
Ziming Liu, Eric Gan, Max Tegmark
arXiv ID
2305.08746
Category
cs.NE: Neural & Evolutionary
Cross-listed
cond-mat.dis-nn,
cs.AI,
cs.LG,
math.RT,
q-bio.NC
Citations
50
Venue
arXiv.org
Repository
https://github.com/KindXiaoming/BIMT
โญ 175
Last Checked
1 month ago
Abstract
We introduce Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable. Inspired by brains, BIMT embeds neurons in a geometric space and augments the loss function with a cost proportional to the length of each neuron connection. We demonstrate that BIMT discovers useful modular neural networks for many simple tasks, revealing compositional structures in symbolic formulas, interpretable decision boundaries and features for classification, and mathematical structure in algorithmic datasets. The ability to directly see modules with the naked eye can complement current mechanistic interpretability strategies such as probes, interventions or staring at all weights.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Neural & Evolutionary
R.I.P.
๐ป
Ghosted
Progressive Growing of GANs for Improved Quality, Stability, and Variation
R.I.P.
๐ป
Ghosted
Learning both Weights and Connections for Efficient Neural Networks
R.I.P.
๐ป
Ghosted
LSTM: A Search Space Odyssey
R.I.P.
๐ป
Ghosted
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
R.I.P.
๐ป
Ghosted