Classification Assessment Tool: A program to measure the uncertainty of classification models in terms of class-level metrics

Szilard Szabo,Imre J. Holb, Vanda Eva Abriha-Molnar, Gabor Szatmari,Sudhir Kumar Singh,David Abriha

APPLIED SOFT COMPUTING（2024）

引用 0|浏览3

暂无评分

摘要

Accuracy assessments are important steps of classifications and get higher relevance with the soar of machine and deep learning techniques. We provided a method for quick model evaluations with several options: calculate the class level accuracy metrics for as many models and classes as needed; calculate model stability using random subsets of the testing data. The outputs are single calculations, summaries of the repetitions, and/or all accuracy results per repetitions. Using the application, we demonstrated the possibilities of the function and analyzed the accuracies of three experiments. We found that some popular metrics, the binary Overall Accuracy, Sensitivity, Precision, and Specificity, as well as ROC curve, can provide false results when the true negative cases dominate. F1-score, Intersection over Union and the Matthews correlation coefficient were reliable in all experiments. Medians and interquartile ranges (IQR) of the repeated sampling from the testing dataset showed that IQR were small when a model was almost perfect or completely unacceptable; thus, IQR reflected the model stability, reproducibility. We found that there were no general, statistically justified relationship with the median and IQR, furthermore, correlations of accuracy metrics varied by experiments, too. Accordingly, a multi-metric evaluation is suggested instead of a single metric.

查看译文

关键词

Model evaluation,Model stability,Testing,Repetitions,Python

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要