EcForest: Extractive document summarization through enhanced sentence embedding and cascade forest

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2019)

引用 7|浏览26
暂无评分
摘要
We present EcForest, an extractive summarization model through Enhanced Sentence Embedding and Cascade Forest. Sentence representation is of great significance for many summarization methods. Bag-of-words mostly fails to grasp the semantics, and typical embedding models cannot capture more complex semantic features, such as polysemy and the meaning of a phrase, which is usually ignored by simply averaging the word embeddings included in a sentence. To this end, we propose Enhanced Sentence Embedding (ESE) model to solve such drawbacks via mapping several valid features to dense vectors. Essentially, the enhanced sentence embedding is a novel model for improving the distributed representation of sentence. Our sentence embedding model is universally applicable and it can be adapted to other NLP tasks. Moreover, deep forest is used as a sentence extraction algorithm for its robustness to the hyper-parameters and its efficient training algorithm compared to deep neural network. The evaluation of variant models proposed in this work proves the validation of the enhanced sentence embedding. The comparison results between EcForest and several baselines on two different datasets demonstrate that the proposed summarization model performs better than or with high competitiveness to the state-of-the-art.
更多
查看译文
关键词
cascade forest,extractive document summarization,natural language processing,position feature vector,sentence embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要