Overparameterized neural network models have limitations such as increased computational complexity, longer training times, and potential overfitting. They may fail to reach optimal minima due to limitations in training algorithms and may not generalize well to unseen data. Additionally, the number of parameters can be a cheap proxy for measuring the model's capacity to memorize data, leading to challenges in understanding and improving the efficiency of these networks in real-world tasks6.
The Effective Model Complexity (EMC) metric is a novel approach to measure the largest sample size a neural network model can perfectly fit, considering realistic training loops and various data types1. It is calculated through an iterative approach, starting with a small training set and incrementally increasing it until the model fails to achieve 100% training accuracy5. This metric helps to understand and evaluate the data-fitting capacity of different neural network architectures, optimizers, and activation functions.
The EMC metric evaluates different neural network architectures by measuring the largest sample size that a model can perfectly fit. It considers factors such as architecture type (e.g., CNNs, MLPs, ViTs), optimizers (e.g., SGD, Adam), and activation functions. The metric reveals the impact of these factors on data-fitting capacity, showing, for example, that CNNs are more parameter-efficient than previously thought and that optimizers and activation functions significantly influence data fitting.