Beyond skeleton parsing: producing a comprehensive large-scale general-English treebank with full grammatical analysis

COLING '96: Proceedings of the 16th conference on Computational linguistics - Volume 1(1996)

引用 34|浏览2
暂无评分
摘要
A treebank is a body of natural language text which has been grammatically annotated by hand, in terms of some previously-established scheme of grammatical analysis. Treebanks have been used within the field of natural language processing as a source of training data for statistical part og speech taggers (Black et al., 1992; Brill, 1994; Merialdo, 1994; Weischedel et al., 1993) and for statistical parsers (Black et al., 1993; Brill, 1993; aelinek et al., 1994; Magerman, 1995; Magerman and Marcus, 1991). In this article, we present the AT'R/Lancaster 7'reebauk of American English, a new resource tbr natural-language-, processing research, which has been prepared by Lancaster University (UK)'s Unit for Computer Research on the English Language, according to specifications provided by ATR (Japan)'s Statistical Parsing Group. First we provide a static description, with (a) a discussion of the mode of selection and initial processing of text for inclusion in the treebank, and (b) an explanation of the scheme of grammatical annotation we then apply to the text. Sec.ond, we supply a process description of the treebank, in which we detail the physical and computational mechanisms by which we have created it. Finally, we lay out plans for the further development of this new treebank. All of the features of the ATR/Lancaster Treebank that are described below represent a radical departure from extant large-scale (Eyes and Leech, 1993; Garside and McEnery, 1993; Marcus et al., 1993) treebanks. We have chosen in this article to present our treebank in some detail, rather than to compare and contrast it with other treebanks. But the major differences between this and earlier treebanks can easily be grasped via a corn-
更多
查看译文
关键词
full grammatical analysis,comprehensive large-scale general-english treebank,skeleton parsing,natural language,natural language processing,computational mechanics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要