Neural Network Approximation: Three Hidden Layers Are Enough
October 25, 2020 ยท Declared Dead ยท ๐ Neural Networks
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zuowei Shen, Haizhao Yang, Shijun Zhang
arXiv ID
2010.14075
Category
cs.LG: Machine Learning
Cross-listed
cs.NE,
stat.ML
Citations
143
Venue
Neural Networks
Last Checked
4 months ago
Abstract
A three-hidden-layer neural network with super approximation power is introduced. This network is built with the floor function ($\lfloor x\rfloor$), the exponential function ($2^x$), the step function ($1_{x\geq 0}$), or their compositions as the activation function in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter $N\in\mathbb{N}^+$, it is shown that FLES networks with width $\max\{d,N\}$ and three hidden layers can uniformly approximate a Hรถlder continuous function $f$ on $[0,1]^d$ with an exponential approximation rate $3ฮป(2\sqrt{d})^ฮฑ 2^{-ฮฑN}$, where $ฮฑ\in(0,1]$ and $ฮป>0$ are the Hรถlder order and constant, respectively. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $ฯ_f(\cdot)$, the constructive approximation rate is $2ฯ_f(2\sqrt{d}){2^{-N}}+ฯ_f(2\sqrt{d}\,2^{-N})$. Moreover, we extend such a result to general bounded continuous functions on a bounded set $E\subseteq\mathbb{R}^d$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $ฯ_f(r)$ as $r\rightarrow 0$ is moderate (e.g., $ฯ_f(r)\lesssim r^ฮฑ$ for Hรถlder continuous functions), since the major term to be concerned in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ independent of $d$ within the modulus of continuity. Finally, we extend our analysis to derive similar approximation results in the $L^p$-norm for $p\in[1,\infty)$ via replacing Floor-Exponential-Step activation functions by continuous activation functions.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted