A Drift-Sensitive Distributed LSTM Method for Short Text Stream Classification

IEEE Transactions on Big Data(2023)

引用 3|浏览107
暂无评分
摘要
Real-world applications especially in the fields of social media have produced massive short text streams. Unlike traditional normal texts, these data present the characteristics of short length, high-volume, high-velocity and variable data distribution etc, which lead to the issues of data sparsity and concept drift. It is hence very challenging for existing short text classification algorithms. Therefore, we propose a flexible Long Short-Term Memory (LSTM) ensemble network based short text stream classification approach, which is implemented in a distributed mode while maintaining the high-accuracy advantage of deep learning models. More specifically, external resource based short text embedding using a pretrained embedding model and CNN is first proposed for the solution to the data sparsity of short texts. Second, to adapt to the high-volume and high-velocity short text streams, a flexible LSTM network is developed and implemented in a distributed mode for classifying short text data streams. Third, a concept drift factor is introduced for adapting to the concept drifts caused by the changing of data distributions. Finally, experiments conducted on three real short text data sets demonstrate that as compared with several state-of-the-art short text (stream) classification approaches, the proposed approach can classify short text streams effectively and efficiently while adapting to concept drifts.
更多
查看译文
关键词
Short text stream,classification,deep learning model,concept drift
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要