Binary Classification with Imbalanced Data

ENTROPY(2024)

引用 0|浏览0
暂无评分
摘要
When the binary response variable contains an excess of zero counts, the data are imbalanced. Imbalanced data cause trouble for binary classification. To simplify the numerical computation to obtain the maximum likelihood estimators of the zero-inflated Bernoulli (ZIBer) model parameters with imbalanced data, an expectation-maximization (EM) algorithm is proposed to derive the maximum likelihood estimates of the model parameters. The logistic regression model links the Bernoulli probabilities with the covariates in the ZIBer model, and the prediction performance among the ZIBer model, LightGBM, and artificial neural network (ANN) procedures is compared by Monte Carlo simulation. The results show that no method can dominate the other methods regarding predictive performance under the imbalanced data. The LightGBM and ZIBer models are more competitive than the ANN model for zero-inflated-imbalanced data sets.
更多
查看译文
关键词
artificial neural network,expectation-maximization algorithm,Entropy,logistic regression,zero-inflated model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要