Unsupervised Dependency Parsing without Gold Part-of-Speech Tags.

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing(2011)

引用 33|浏览1
暂无评分
摘要
We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags --- requiring a word to always have the same part-of-speech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of-the-art dependency grammar inducer achieves 59.1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus --- 0.7% higher than using gold tags.
更多
查看译文
关键词
different context,gold tag,unsupervised word clustering,classic clustering algorithm,dependency grammar induction,different tag,gold part-of-speech tag,grammar induction,state-of-the-art dependency grammar inducer,superior performance,Unsupervised dependency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要