AlgBERT: Automatic Construction of Annotated Corpus for Sentiment Analysis in Algerian Dialect

Khaoula Hamadouche, Kheira Zineb Bousmaha, Mohamed Abdelwaret Bekkoucha,Lamia Hadrich-Belguith

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING(2023)

引用 0|浏览0
暂无评分
摘要
Nowadays, sentiment analysis is one of the most crucial research fields of Natural Language Processing (NLP), and it is widely applied in a variety of applications such as marketing and politics. However, the Arabic language still lacks sufficient language resources to enable the tasks of opinion and emotion analysis comparing to other language such as English. Additionally, manual annotation requires a lot of effort and time. In this article, we address this problem and propose a novel automated annotation platform for sentiment analysis called AlgBERT by providing annotated corpus and using deep learning technology that includes many automatic natural language processing algorithms, which is the basis for text classification and opinion analysis. We suggest using BERT model as a method; it is the abbreviation of Bidirectional Encoder Representations from Transformers, as it is one of the most effective technologies in terms of results in different world languages. We used around of 54K comments collected from social networking (Twitter, YouTube) written in Arabic and Algerian dialects. Our AlgBERT system obtained excellent results with an accuracy of 91.04%, and this is considered as one of the best results for opinion analysis in Algerian dialect.
更多
查看译文
关键词
Annotated corpus,deep learning,BERT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要