SETAR: Stacking Ensemble Learning for Thai Sentiment Analysis Using RoBERTa and Hybrid Feature Representation

IEEE Access(2023)

引用 0|浏览4
暂无评分
摘要
Sentiment classification of social media posts is among the most challenging and time-consuming tasks for analysts. This is particularly true when applied to languages that employ scriptio continua, such as the Thai language, in which there are no spaces between written words and where there is no end of sentence punctuation. Thai is considered a scarce-resource language as few datasets are available to researchers. Although machine-learning (ML) and deep-learning (DL) algorithms can identify sentiment classification polarity, the performance of the existing classification models are still inadequate. This study proposes a novel stacking ensemble learning technique for identifying sentiment classification polarity in the Thai language, SETAR. Our stacking ensemble strategy utilized the pre-trained Thai language model (WangChanBERTa), based on a Robustly Optimized BERT Pretraining Approach (RoBERTa) architecture to form a feature vector. This feature was combined with three distinct feature vectors obtained from three well-known categories, namely Word2Vec, TF-IDF, and bag-of-words, as a new hybrid sentence representation. The base learners were trained using seven chosen complex heterogeneous ML algorithms, including support vector machine (SVM), random forest (RF), extremely randomized trees (ET), light gradient boosting machine (LGBM), multi-layer perceptron (MLP), partial least squares (PLS), and logistic regression (LR) to enable the development of the final meta-learners. The results revealed that our proposed stacking ensemble model outperformed the baseline models of all classification metrics among the training and test sets, as was determined by extensive benchmarking, carried out on the four datasets, which included our developed sentiment corpus that domain experts annotated.
更多
查看译文
关键词
Deep learning,ensemble learning,sentiment analysis,text classification,transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要