SyntacticDiff: Operator-based transformation for comparative text mining

Big Data(2015)

引用 5|浏览37
暂无评分
摘要
We describe SyntacticDiff, a novel, general, and efficient edit-based method for transforming sequences of words given a reference text collection. These transformations can be used directly or can be employed as features to represent text data in a wide variety of text mining applications. As case studies, we apply SyntacticDiff to three quite different tasks, including grammatical error correction, student essay clustering and analysis, and native language identification, showing its benefit in each case. SyntacticDiff is completely general and can thus be potentially applied to any text data in any natural language. It is highly efficient, customizable, and able to capture syntactic differences from a reference text collection at the sentence, document, and subcollection levels. This enables both a rich translation method and feature representation for many text mining tasks that deal with word usage and syntax beyond bag-of-words.
更多
查看译文
关键词
Comparative Text Mining, Monolingual Translation, Corpus Summarization, Text Categorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要