Improving Multi-class Text Classification Using Balancing Techniques

Laouni Mahmoudi,Mohammed Salem

Artificial Intelligence: Theories and Applications(2023)

引用 0|浏览0
暂无评分
摘要
Social media platforms and micro-blogging websites have grown in popularity in recent years. These platforms are used to express persons’ thoughts and feelings regarding items, people, and events. This massive amount of textual data must be exploited. Sentiment analysis is one of the tools used to take advantage of this text data, in which we classify text into different classes such as positive, negative, neutral, or a number of star classes. It has been investigated by many researchers in several languages. Deep Learning approaches such as CNN, RNN, and LSTM applied on balanced datasets have given efficient results compared to classical machine learning approaches such as SVM, NB, and LR. Furthermore, the apparition of BERT has revolutionized the text classification field, even in sentiment analysis tasks. The main problem that the datasets which have been collected from social media platforms, certain classes dominate others, meaning that the datasets are imbalanced. As a result, classifiers lose efficiency. This paper addresses this issue by introducing an ensemble of mathematical balancing techniques to increase the efficiency of sentiment analysis models based on BERT scheme. The obtained results are significant, indicating that our two main metrics, AVG-Recall and F1-PN, are 17% and 19% higher, respectively, when compared to the classifiers’ results applied to the imbalanced dataset.
更多
查看译文
关键词
Text classification, Sentiment analysis, BERT, araBERT, Imbalanced, Oversampling, Undersampling, Hybrid
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要