Automatic Document Summarization based on Statistical Information.

DATA(2018)

引用 24|浏览3
暂无评分
摘要
Actual problem in nowadays is to efficiently process the large amount of data that pass through our mind everyday. The object of study of this paper is automatic summarization algorithms. The main goal is to implement and make comparison of different summarization techniques on corpora of news articles parsed from the web. This research work contains the description of three summarization techniques based on TextRank algorithm: General TextRank, BM25, LongestCommonSubstring. It is specially noted the languages of used corpora: Russian and Kazakh languages. The results of summarization processes and their comparison are provided. It should be emphasized that used algorithms are well-known, but the way of their evaluation on defined corpora is different from those which usually used in summary evaluation. The method of summary evaluation proposed use the special dictionary of extracted key-words on the topic of corpora. As the title implies the article describes applying statistical information. The semantic and syntactic features of text are not examined.
更多
查看译文
关键词
summarization,automatic extraction,key-words,n-gram,textrank
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要