long standing open question. We propose a phenomenological model of the NN training to
explain this non-overfitting puzzle. Our linear frequency principle (LFP) model accounts for a
key dynamical feature of NNs: they learn low frequencies first, irrespective of microscopic
details. Theory based on our LFP model shows that low frequency dominance of target
functions is the key condition for the non-overfitting of NNs and is verified by experiments …