Capturing word positions does help: A multi-element hypergraph gated attention network for document classification

Yilun Jin,Wei Yin, Haoseng Wang, Fang He

Expert Systems with Applications(2024)

引用 0|浏览2
暂无评分
摘要
Over the last few years, graph-based methods have manifested a significant enhancement in document mining applications such as spam detection, news recommendation, and legal document classification. However, existing graph-based methods have a limited ability to utilize word position and multi-element information within the documents, limiting their effectiveness in practical application. To mitigate this limitation, we propose a novel multi-element hypergraph gated attention network that can capture word position and multi-element information for accurate document classification. Specifically, a new multi-element hypergraph is first proposed to describe the word position, sentence, and full content within the document. Then, a new multi-element homogenization module is applied to mitigate heterogeneity of constructed hypergraph. Meantime, a new hypergraph gated attention module is proposed to filter noise in the constructed hypergraph and derive various element representations that incorporate word position information. Finally, a new block-wise read-out module is designed to fuse learned element representations into comprehensive document representations for classification. Extensive experiments conducted on several real-world datasets demonstrate that the proposed method not only outperforms related state-of-the-art methods but is also faster, making it suitable for a wide range of practical applications. For instance, our method achieved an accuracy improvement of 1.1 % over the best comparative method on some datasets while also operating at a faster speed. Additionally, it demonstrated an impressive 14 % improvement in accuracy over the well-known Generative Pre-trained Transformer 3.5 (GPT-3.5) on one dataset.
更多
查看译文
关键词
Graph learning,Classification model,Natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要