A Comparative Study Of Co-Occurrence Strategies for Building A Cross-Domain Sentiment Thesaurus

2019 First International Conference of Intelligent Computing and Engineering (ICOICE)(2019)

引用 1|浏览23
暂无评分
摘要
With the evolution of user-based web content, people naturally and freely share their opinion in numerous domains. However, this would result in a massive cost to label training data for many domains and prevent us from taking advantage of the shared information across-domains. As a result, cross-domain sentiment analysis is a challenging NLP task due to feature and polarity divergence. To build a sentiment sensitive thesaurus that to group different features that express the same sentiments for cross-domain sentiment classification, different co-occurrence measures are used. This paper presents a comparative study covering different co-occurrence methods for building a cross-domain sentiment thesaurus. This work also defines a Bidirectional Conditional Probability (BCP) to handle the unsymmetrical co-occurrence problem. Two machine learning classifiers (Naïve Bayes (NB) and Support Vector Machine (SVM)) and three feature selection methods (Information gain, Odd ratio, Chi-square) are used to evaluate the proposed model. Experimental results show that BCP results outperform four baseline co-occurrence calculation methods (PMI, PMI-square, EMI, and G-means) in the task of cross-domain sentiment analysis.
更多
查看译文
关键词
sentiment analysis,cross-domain sentiment analysis,co-occurrence calculation methods,sentiment thesaurus,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要