Document Summarization Using Sentence-Level Semantic Based On Word Embeddings

INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING(2019)

引用 8|浏览84
暂无评分
摘要
In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.
更多
查看译文
关键词
Word embedding, sentence vector, Word2Vec, document summarization, cosine similarity, BM25
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要