Short Text Topic Learning Using Heterogeneous Information Network

IEEE Transactions on Knowledge and Data Engineering(2023)

引用 9|浏览16
暂无评分
摘要
With the explosive growth of short texts on users' interests and preferences, learning discriminative and coherent latent topics from short texts is a critical and significative work, since many practical applications, such as e-commerce and recommendations, require semantic understandings that short texts convey explicitly and implicitly. However, existing short text topic learning methods face the challenge of fully capturing semantically related co-occurrence phrases. Therefore, this paper proposes a novel Heterogeneous Information Network-based Short Text Topic learning approach (HIN-ShoTT) in terms of parts of speech, without depending on any auxiliary information. Specifically, HIN-ShoTT can be decomposed into three phases: i) seeking semantic relations among words with different parts of speech, where HIN-ShoTT models multiple explicit and implicit semantic relations among words based on a Heterogeneous Information Network (HIN) in terms of parts of speech; ii) extracting co-occurrence phrases and filtering noises, where HIN-ShoTT defines parts-of-speech meta structures to guide co-occurrence phrase extraction and a self-adapting threshold filtering module is proposed for discarding noises; and iii) inferring topics, where HIN-ShoTT directly models the generative process of co-occurrence phrases to make topic learning effective with the abundant corpus-level information. Our experimental results on three real-world datasets not only show that HIN-ShoTT performs well, but also demonstrate that it is feasible to incorporate HIN into short text topic learning for accuracy improvement.
更多
查看译文
关键词
Short texts,topic learning,heterogeneous information network,parts of speech,meta structure,natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要