Dug: A Semantic Search Engine Leveraging Peer-Reviewed Literature to Span Biomedical Data Repositories

biorxiv(2021)

引用 0|浏览9
暂无评分
摘要
Motivation As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets that utilizes evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. Results Developed through the National Heart, Lung, and Blood Institute’s (NHLBI) BioData Catalyst ecosystem, Dug can index more than 15,911 study variables from public datasets in just over 39 minutes. On a manually curated search dataset, Dug’s mean recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch’s mean recall of 0.76. When using synonyms or related concepts as search queries, Dug’s (0.28) far outperforms Elasticsearch (0.1) in terms of mean recall. Availability and Implementation Dug is freely available at . An example Dug deployment is also available for use at . Contact awaldrop{at}rti.org or scox{at}renci.org ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要