On automatic text segmentation.

DOCENG(2014)

引用 7|浏览29
暂无评分
摘要
ABSTRACTAutomatic text segmentation, which is the task of breaking a text into topically-consistent segments, is a fundamental problem in Natural Language Processing, Document Classification and Information Retrieval. Text segmentation can significantly improve the performance of various text mining algorithms, by splitting heterogeneous documents into homogeneous fragments and thus facilitating subsequent processing. Applications range from screening of radio communication transcripts to document summarization, from automatic document classification to information visualization, from automatic filtering to security policy enforcement - all rely on, or can largely benefit from, automatic document segmentation. In this article, a novel approach for automatic text and data stream segmentation is presented and studied. The proposed automatic segmentation algorithm takes advantage of feature extraction and unusual behaviour detection algorithms developed in [4, 5]. It is entirely unsupervised and flexible to allow segmentation at different scales, such as short paragraphs and large sections. We also briefly review the most popular and important algorithms for automatic text segmentation and present detailed comparisons of our approach with several of those state-of-the-art algorithms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要