A novel oversampling method based on SeqGAN for imbalanced text classification

2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2019)

Cited 5|Views2
No score
Abstract
Nowadays, investors prefer to learn the latest trends of stock market through the Internet, grasp the best time to buy and sell stocks, and simultaneously release their comments during the trading process through various online media. Therefore, many researchers utilize NLP approaches to extract sentiment tendency of investors from online comments. However, since there exists noise in corpora related to stock, when extracting emotional tendency from these corpora by using text classification, it is inevitable to face the imbalanced dataset problem, which is harmful to traditional machine learning classifiers. Our research innovatively proposes a preprocessing method based on SeqGAN algorithm, in order to solve the imbalanced classification problem. We carry out experiments on different corpora, and results show that compared with other four oversampling methods, the oversampling method based on SeqGAN can improve the performance of text classifier at best both in binary class and multiclass text classification, which proves to be an effective method for preprocessing text data.
More
Translated text
Key words
Imbalanced text classification, oversampling, SeqGAN, Sentiment Analysis
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined