Towards the extended barcode concept: Generating DNA reference data through genome skimming of danish plants

biorxiv(2021)

引用 1|浏览3
暂无评分
摘要
Background Recently, there has been a push towards the extended barcode concept of utilising chloroplast genomes (cpGenome) and nuclear ribosomal DNA (nrDNA) sequences for molecular identification of plants instead of the standard barcode regions. These extended barcodes has a wide range of applications, including biodiversity monitoring and assessment, primer design, and evolutionary studies. However, these extended barcodes are not well represented in global reference databases. To fill this gap, we generated cpGenomes and nrDNA reference data from genome skims of 184 plant species collected in Denmark. We further explored the application of our generated reference data for molecular identifications of plants in an environmental DNA metagenomics study. Results We assembled partial cpGenomes for 82.1% of sequenced species and full or partial nrDNA sequences for 83.7% of species. We added all assemblies to GenBank, of which chloroplast reference data from 101 species and nuclear reference data from 6 species were not previously represented. On average, we recovered 45 genes per species. The rate of recovery of standard barcodes was higher for nuclear barcodes (>89%) than chloroplast barcodes (< 60%). Extracted DNA yield did not affect assembly outcome, whereas high GC content did so negatively. For the in silico simulation of metagenomic reads, taxonomic assignments using the reference data generated had better species resolution (94.9%) as compared to GenBank (18.1%) without any identification errors. Conclusions Genome skimming generates reference data of both standard barcodes and other loci, contributing to the global DNA reference database for plants. ### Competing Interest Statement The authors have declared no competing interest. * atpB : adenosine triphosphate synthase subunit beta BEMT : Blunt End Multi Tube BOLD : Barcode of Life Data Systems BP : base pairs BSA : Bovine Serum Albumin BWA : Burrows-Wheeler Alignment CDS : Coding sequences COI : cytochrome-c oxidase subunit 1 cpGenome : Chloroplast genome DNAmark : Danish national DNA reference database eDNA : environmental DNA GATK : GenomeAnalysisTK GBIF : Global Biodiversity Information Facility HTS : high-throughput sequencing ITS : internal transcribed spacers LCA : lowest common ancestor LSU rRNA : large subunit ribosomal ribonucleic acid matK : maturase K nrDNA : nuclear ribosomal sequences ORG.Annot : Organelle Annotator ORG.asm : ORGanelle ASseMbler PE : paired-end PVP : Polyvinylpyrrolidone qPCR : quantitative PCR rbcL : ribulose-bisphosphate carboxylase large chain SD : standard deviation sedaDNA : Ancient sedimentary DNA
更多
查看译文
关键词
dna reference data,genome skimming,extended barcode concept
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要