Model Selection for Deep Learning with Gradient Norms: An Empirical Study

Research Square (Research Square)(2023)

引用 0|浏览19
暂无评分
摘要
Abstract Recent theoretical works [1–3] propose to exploit the loss gradient norm averaged across training samples as a measure of overfitting and generalization performance of DNNs, to investigate the upper bound of generalization error of deep neural networks (DNNs). However, the extremely high computational cost makes such gradient norms infeasible to obtain in practice. In this work, we carry out empirical studies by proposing a formulated mean empirical gradient norms (MEGAN), where we use a fast implementation of Fully-Connected (FC) Layer gradient norms [4] to compute MEGAN. Our empirical studies find that the sum of MEGAN over the optimization path of deep learning (i.e. epochs) could accurately predict the generalization performance of DNNs. Our empirical studies also include extensive experiments to demonstrate the potential of MEGAN for model selection of DNNs in hyper-parameter search settings. Our in-depth analyses interpret the behavior of MEGAN during training epochs and confirm MEGAN as an efficient and effective way to measuring the generalization performance of DNN with the training set only.
更多
查看译文
关键词
deep learning,gradient norms,selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要