The Colorado Richly Annotated Full Text (CRAFT) corpus: Multi-model annotation in the biomedical domain

Handbook of Linguistic Annotation(2017)

引用 44|浏览9
暂无评分
摘要
The Colorado Richly Annotated Full Text (CRAFT) corpus consists of full-text journal articles. The primary motivation for the annotation project was the accumulating body of evidence indicating that the bodies of journal articles contain much information that is not present in the abstracts, and that the textual and structural characteristics of article bodies are different from those of abstracts. The development of CRAFT was characterized by a “multi-model” annotation task. The sample population was all journal articles that had been used by the Mouse Genome Informatics group as evidence for at least one Gene Ontology or Mouse Phenotype Ontology “annotation.” The linguistic annotation is represented in the widely known Penn Treebank format (Marcus et al., Comput. Linguist. 19(2), 313–330, 1993) [50], with the addition of a small number of tags and phrasal categories to accommodate the idiosyncrasies of …
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要