Smoothing Multinomial Naïve Bayes in the Presence of Imbalance.

MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition(2011)

引用 4|浏览4
暂无评分
摘要
Multinomial naïve Bayes is a popular classifier used for a wide variety of applications. When applied to text classification, this classifier requires some form of smoothing when estimating parameters. Typically, Laplace smoothing is used, and researchers have proposed several other successful forms of smoothing. In this paper, we show that common preprocessing techniques for text categorization have detrimental effects when using several of these well-known smoothing methods. We also introduce a new form of smoothing for which these detrimental effects are less severe: ROSE smoothing, which can be derived from methods for cost-sensitive learning and imbalanced datasets. We show empirically on text data that ROSE smoothing performs well compared to known methods of smoothing, and is the only method tested that performs well regardless of the type of text preprocessing used. It is particularly effective compared to existing methods when the data is imbalanced.
更多
查看译文
关键词
ROSE smoothing,detrimental effect,Laplace smoothing,well-known smoothing method,text categorization,text classification,text data,common preprocessing technique,imbalanced datasets,new form,Smoothing multinomial
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要