Lyu, Zhaoyan;
Aminian, Gholamali;
Rodrigues, Miguel RD;
(2023)
On Neural Networks Fitting, Compression, and Generalization Behavior via Information-Bottleneck-like Approaches.
Entropy
, 25
(7)
, Article 1063. 10.3390/e25071063.
Preview |
PDF
entropy-25-01063.pdf - Published Version Download (2MB) | Preview |
Abstract
It is well-known that a neural network learning process—along with its connections to fitting, compression, and generalization—is not yet well understood. In this paper, we propose a novel approach to capturing such neural network dynamics using information-bottleneck-type techniques, involving the replacement of mutual information measures (which are notoriously difficult to estimate in high-dimensional spaces) by other more tractable ones, including (1) the minimum mean-squared error associated with the reconstruction of the network input data from some intermediate network representation and (2) the cross-entropy associated with a certain class label given some network representation. We then conducted an empirical study in order to ascertain how different network models, network learning algorithms, and datasets may affect the learning dynamics. Our experiments show that our proposed approach appears to be more reliable in comparison with classical information bottleneck ones in capturing network dynamics during both the training and testing phases. Our experiments also reveal that the fitting and compression phases exist regardless of the choice of activation function. Additionally, our findings suggest that model architectures, training algorithms, and datasets that lead to better generalization tend to exhibit more pronounced fitting and compression phases.
Type: | Article |
---|---|
Title: | On Neural Networks Fitting, Compression, and Generalization Behavior via Information-Bottleneck-like Approaches |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.3390/e25071063 |
Publisher version: | https://doi.org/10.3390/e25071063 |
Language: | English |
Additional information: | © 2023 by the Authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
Keywords: | deep learning; information theory; information bottleneck; generalization; fitting; compression |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10173945 |
Archive Staff Only
![]() |
View Item |