Text Pattern Visualization for analysis of biology full text and captions

CSB(2003)

引用 3|浏览11
暂无评分
摘要
Large textbanks comprised of thousands of full-textbiology papers are rapidly becoming available. Wedescribe an approach to characterize all major languagepatterns in biology text in terms of Frameworks.Frameworks are "containers" made up of commonphrases surrounding specific informational items such asgene and protein names. A Framework Viewer has beendeveloped that shows similar text Frameworks aligned onthe screen much as biosequence visualization tools do.Using the Viewer, it is evident that Frameworks have thepower to find the types of structures needed to developuseful information retrieval systems. As a simpleexample, one Framework was able to concisely select45,000 nouns from a corpus of 5 million words withouterror. This work points the way to highly automatedsystems that will be able to extract and index informationin biology textbanks. Work in progress includesextensions to characterize recursive structures in text,subsystems to retrieve figures in papers, and thediscovery of semantic relations to aid concept-basedretrieval.
更多
查看译文
关键词
large textbanks,framework viewer,index informationin biology textbanks,information retrieval system,full-textbiology paper,biology full text,work point,text pattern visualization,major languagepatterns,biology text,shows similar text,biosequence visualization tool,indexing,text analysis,part of speech,proteins,natural languages,information retrieval systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要