Improved Terms Weighting Algorithm of Text

NCIS '11 Proceedings of the 2011 International Conference on Network Computing and Information Security - Volume 02(2011)

引用 2|浏览0
暂无评分
摘要
Most of traditional information retrieval and automatic text classification methods with vector space model almost need determine the weighting of the feature terms. Term weighting plays an important role to achieve high performance in information retrieval and text classification. The popular method is using term frequency (tf) and inverse document frequency (idf) for representing importance and computing weighting of terms. But the tf-idf model is not introduced class information, the important information such as title, abstract, conclusion, and the synonymous words information. This paper provides an improved method to compute weighting of the terms. The above information is involved. The experimental results show that the performance is enhanced. The role of important and representative terms is raised and the effect of the unimportant feature term to retrieval and classification is decreased. In addition, the F1 based on new algorithm is higher than based on traditional tf-idf model.
更多
查看译文
关键词
important information,automatic text classification method,term frequency,improved terms weighting algorithm,vector space model,important role,term weighting,computing weighting,pattern classification,information retrieval,text classification,feature term,synonymous words information,class information,tf-idf model,inverse document frequency,text analysis,improved terms,traditional information retrieval,information tetrieval,classification algorithms,mathematical model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要