Lexicon-based Non-Compositional Multiword Augmentation Enriching Tweet Sentiment Analysis

2022 3rd International Conference on Artificial Intelligence and Data Sciences (AiDAS)(2022)

引用 3|浏览8
暂无评分
摘要
One of the benefits of recognizing a slang, an id-iom or an abbreviation in a tweet is the ability to help in finding certain sentiment in a concise and understandable manner. However, a lack of adequate annotated “idiomatic tweets” makes classification challenging. We propose a pliable augmen-tation technique to improve the classification of idiomatic tweets with tiny training samples. For classification, we evaluate the performance of fine-tuning version of a pre-trained embedding model at different flavors. During the augmentation process, we deduce the intrinsic propositional meaning of the idiomatic ex-pression from IBM's SliDE (Sentiment Lexicon of IDiomatic Expressions) and another lexicon we built. The empirical results show that the proposed method is beneficial in concealing the actual intent of the tweet and advantageous to tackle the prob-lem of overfitting caused by smaller training sets. The experi-ment shows that using data augmentation of the idiomatic ex-pressions has reduced the classification error rate with 16%.
更多
查看译文
关键词
Sentiment Analysis,Idiomatic,augmentation,twitter,Knowledge-base
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要