Biodiversity Knowledge Graphs: Time to move up a gear!

Biodiversity Information Science and Standards(2021)

引用 1|浏览5
暂无评分
摘要
Harnessing worldwide biodiversity data requires integrating myriad pieces of information, often sparse and incomplete, into a global, coherent data space. To do so, projects like the Global Biodiversity Information Facility, Catalog of Life and Encyclopedia of Life have set up platforms that gather, consolidate, and centralize billions of records from multiple data sources. This approach lowers the entry barrier for scientists willing to consume aggregated biodiversity data but tends to build silos that hamper cross-platform interoperability. The Web of Data embodies a different approach underpinned by the Linked Open Data (LOD) principles (Heath and Bizer 2011). These principles bring about the building of a large, distributed, cross-domain knowledge graph (KG), wherein data description relies on vocabularies with shared, formal, machine-processable semantics. So far however, little biodiversity data have been published this way. Early efforts focused primarily on taxonomic registers, such as NCBI, VTO and AGROVOC. More recent efforts have started paving the way for the publication of more diverse biodiversity KGs (Page 2019, Penev et al. 2019, Michel et al. 2017). Today, we believe that it is time for more biodiversity data producers to join in and start publishing connected KGs spanning a much broader set of domains, far beyond just taxonomic registers. In this talk, we wish to present an on-going endeavor in line with this vision. In a previous work, we published TAXREF-LD (Michel et al. 2017), a LOD representation of the French taxonomic register developed and maintained by the French National Museum of Natural History. We modeled nomenclatural information as a thesaurus of scientific names, taxonomic information as an ontology of classes denoting taxa, and additional information such as ranks and vernacular names. Recently, we have extended the scope of TAXREF-LD to represent and interlink data as various as geographic locations, species interactions, development stages, trophic levels, as well as conservation, biogeographic, and legal status (regulations, protections, etc.). We put a specific effort into working out a model that accurately accounts for the semantics of the data while respecting knowledge engineering practices. For instance, a common design shortcoming is to attach all information as properties of a taxon. This is a rightful choice for some properties like a scientific name or conservation status, but properties that actually pertain to biological individuals themselves, e.g. habitat and trophic level, should better be attched to class members. With the presentation of this work, we wish to advance the discussion about integration scenarios based on knowledge graphs with the different biodiversity data stakeholders.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要