Tibetan News Text Classification Based on Multi-features Fusion

2021 IEEE International Conference on Data Science and Computer Application (ICDSCA)(2021)

引用 0|浏览2
暂无评分
摘要
At present, the main research method of Tibetan text classification research is to use Tibetan words and Tibetan syllables as the basic representation features of the entire text, and then use Deep Neural Network models such as CNN and LSTM to complete the text classification research. Under normal circumstances, Tibetan news texts are composed of the headlines and the content. The news headlines summarize the news content in a short text sequence. That is to say, relative to the content of a news text, the headlines have more important weight information. Due to the short text sequence length of the headlines, there will be sparse features. This paper proposes to use both syllables and words features as the basic representation objects of headlines, so that the convolutional neural network model can extract more comprehensive semantic features of the headlines. For the task of Tibetan news texts classification, the experimental method in this paper is to simultaneously express the basic characteristics of the headlines and the content of the Tibetan news text at the words and syllables. Then after feature vectorization, the CNN network is used to complete the text classification. The research results show that the classification method proposed in this paper has achieved good classification results on the test set of experimental corpus.
更多
查看译文
关键词
tibetan news text classification,multi-features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要