Hybrid Method for Short Text Topic Modeling

Communications in computer and information science(2023)

引用 0|浏览6
暂无评分
摘要
The rise in social media’s popularity has led to a significant increase in user-generated content across various topics. Extracting information from these data can be valuable for different domains, however, due to the nature of the vast volume it is not possible to extract information manually. Different aspects of information extraction have been introduced in literature including identifying what topic is discussed in the text. The challenge becomes even bigger when the text is short, such as found in social media. Various methods for topic modeling have been proposed in the literature that could be generally categorized as unsupervised and supervised learning. However, unsupervised topic modeling methods have some shortcomings, such as semantic loss and poor explanation, and are sensitive to the choice of parameters, such as the number of topics. While supervised machine learning methods based on deep learning can achieve high accuracy they need data annotated by humans, which is time-consuming and costly. To overcome the above mentioned disadvantages this work proposes a hybrid topic modeling method that combines the advantages of both unsupervised and supervised methods. We built a hybrid model by combining Latent Dirichlet Allocation (LDA) and deep learning built on top of the Bidirectional Encoder Representations from the Transformers (BERT) model. LDA is used to identify the optimal number of topics and topic-relevant keywords where the only need for human input, with the aid of ChatGPT, is to identify associated topics based on topic-specific keywords. This annotation is used to train and fine-tune the BERT model. In the experimental evaluation of posts related to climate change, we show that the proposed concept is applicable for predicting topics from short text without the need for lengthy and costly annotation.
更多
查看译文
关键词
topic,short,text,hybrid method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要