Understanding the Resilience of Neural Network Ensembles against Faulty Training Data

Abraham Chan,Niranjhana Narayanan,Arpan Gujarati,Karthik Pattabiraman,Sathish Gopalakrishnan

2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS)（2021）

引用 6|浏览2

暂无评分

摘要

Machine learning is becoming more prevalent in safety-critical systems like autonomous vehicles and medical imaging. Faulty training data, where data is either misla-belled, missing, or duplicated, can increase the chance of misclassification, resulting in serious consequences. In this paper, we evaluate the resilience of ML ensembles against faulty training data, in order to understand how to build better ensembles. To support our evaluation, we develop a fault injection framework to systematically mutate training data, and introduce two diversity metrics that capture the distribution and entropy of predicted labels. Our experiments find that ensemble learning is more resilient than any individual model and that high accuracy neural networks are not necessarily more resilient to faulty training data. Further, we find that simple majority voting suffices in most cases for resilience in ML ensembles. Finally, we observe diminishing returns for resilience as we increase the number of models in an ensemble. These findings can help machine learning developers build ensembles that are both more resilient and more efficient.

查看译文

关键词

Error resilience,Machine learning,Training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要