Constructing Domain Templates With Concept Hierarchy As Background Knowledge

INFORMATION TECHNOLOGY AND CONTROL(2014)

引用 4|浏览7
暂无评分
摘要
In recent years, both academia and the industry have seen a push for converting unstructured data, most commonly text, into structured representations. A relatively poorly explored challenge in this area is that of domain template construction: for a domain, we wish to find the attributes with which texts from that domain can be meaningfully represented. For example, given the domain of news reports on bombing attacks, we would like to identify the existence of concepts like "victim" and "perpetrator". We introduce two new methods for this task, both operating on semantic representations of input data and exploiting the hierarchical organization of features, something not explored in prior art. We evaluate on multiple datasets/domains and achieve performance at least comparable to a state of the art method on a set of "real world" scenarios while additionally identifying fine-grained type information for properties: for example, the bombing attack victim is found to be of type "defender" (policeman, guard, ...). We also provide the first fully documented evaluation methodology, publicly available labeled datasets and golden standard outputs for this research problem, supporting and facilitating future work in the area.
更多
查看译文
关键词
text mining, open-domain information extraction, schema induction, graph mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要