Chrome Extension
WeChat Mini Program
Use on ChatGLM

FAIR enough? A perspective on the status of nucleotide sequence data and metadata on public archives

bioRxiv(2021)

Cited 0|Views8
No score
Abstract
Knowledge derived from nucleotide sequence data is increasing in importance in the life sciences, as well as decision making (mainly in biodiversity policy). Metadata standards have been established to facilitate sustainable sequence data management according to the FAIR principles (Findability, Accessibility, Interoperability, Reusability). Here, we review the status of metadata available for raw read Illumina amplicon and whole genome shotgun sequencing data derived from ecological metagenomic material that are accessible at the European Nucleotide Archive (ENA), as well as the compliance of the primary sequence data (fastq files) with data submission requirements. While overall basic metadata, such as geographic coordinates, were retrievable in 98% of the cases for this type of sequence data, interoperability was not always ensured and other (mainly conditionally) mandatory parameters were often not provided at all. Metadata standards, such as the ‘Minimum Information about any(x) Sequence (MIxS)’, were only infrequently used despite a demonstrated positive impact on metadata quality. Furthermore, the sequence data itself did not meet the prescribed requirements in 31 out of 39 studies that were manually inspected. To tackle the most immediate needs to improve FAIR sequence data management, we provide a list of minimal suggestions to researchers, research institutions, funding agencies, reviewers, publishers, and databases, that we believe might have a potentially large positive impact on sequence data and metadata FAIRness, which is crucial for further research and its derived applications. ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined