Adapting naive Bayes tree for text classification

Knowledge and Information Systems(2014)

引用 88|浏览95
暂无评分
摘要
Naive Bayes (NB) is one of the top 10 algorithms thanks to its simplicity, efficiency, and interpretability. To weaken its attribute independence assumption, naive Bayes tree (NBTree) has been proposed. NBTree is a hybrid algorithm, which deploys a naive Bayes classifier on each leaf node of the built decision tree and has demonstrated remarkable classification performance. When comes to text classification tasks, multinomial naive Bayes (MNB) has been a dominant modeling approach after the multi-variate Bernoulli model. Inspired by the success of NBTree, we propose a new algorithm called multinomial naive Bayes tree (MNBTree) by deploying a multinomial naive Bayes text classifier on each leaf node of the built decision tree. Different from NBTree, MNBTree builds a binary tree, in which the split attributes’ values are just divided into zero and nonzero. At the same time, MNBTree uses the information gain measure instead of the classification accuracy measure to build the tree for reducing the time consumption. To further scale up the classification performance of MNBTree, we propose its multiclass learning version called multiclass multinomial naive Bayes tree (MMNBTree) by applying the multiclass technique to MNBTree. The experimental results on a large number of widely used text classification benchmark datasets validate the effectiveness of our proposed algorithms: MNBTree and MMNBTree.
更多
查看译文
关键词
Text classification,Multinomial naive Bayes,Multinomial naive Bayes tree,Multiclass learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要