Dictionary extraction based on statistical data

Вестник КазНУ. Серия математика, механика, информатика(2018)

引用 0|浏览0
暂无评分
摘要
Automatic text summarization is an actual problem when working with a large amount of information. Most of the algorithms that work on the basis of statistical data build a summary text content by counting the similarity of text units and units importance. Text unit could be a word, sentence or paragraph, in our case unit is a sentence. Similarity is considered the presence of key-words in the sentences. Key-words are words that indicate the topic of the text. In this research work we will describe an automatic extraction of key-words dictionary, where key-words are N-grams with N from 1 to 5. Two algorithms were implemented: getting of words that occur only in one of two different corpora and getting of words with high importance. Importance of N- gram denotes its belonging to the topic of the text. Used text languages are Russian and Kazakh. The algorithms show important results, both of them make sense in constructing of full key-words dictionary.
更多
查看译文
关键词
automatic extraction,key-words,n-gram
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要