Towards the extended barcode concept: Generating DNA reference data through genome skimming of danish plants
biorxiv(2021)
摘要
Background Recently, there has been a push towards the extended barcode concept of utilising chloroplast genomes (cpGenome) and nuclear ribosomal DNA (nrDNA) sequences for molecular identification of plants instead of the standard barcode regions. These extended barcodes has a wide range of applications, including biodiversity monitoring and assessment, primer design, and evolutionary studies. However, these extended barcodes are not well represented in global reference databases. To fill this gap, we generated cpGenomes and nrDNA reference data from genome skims of 184 plant species collected in Denmark. We further explored the application of our generated reference data for molecular identifications of plants in an environmental DNA metagenomics study.
Results We assembled partial cpGenomes for 82.1% of sequenced species and full or partial nrDNA sequences for 83.7% of species. We added all assemblies to GenBank, of which chloroplast reference data from 101 species and nuclear reference data from 6 species were not previously represented. On average, we recovered 45 genes per species. The rate of recovery of standard barcodes was higher for nuclear barcodes (>89%) than chloroplast barcodes (< 60%). Extracted DNA yield did not affect assembly outcome, whereas high GC content did so negatively. For the in silico simulation of metagenomic reads, taxonomic assignments using the reference data generated had better species resolution (94.9%) as compared to GenBank (18.1%) without any identification errors.
Conclusions Genome skimming generates reference data of both standard barcodes and other loci, contributing to the global DNA reference database for plants.
### Competing Interest Statement
The authors have declared no competing interest.
* atpB
: adenosine triphosphate synthase subunit beta
BEMT
: Blunt End Multi Tube
BOLD
: Barcode of Life Data Systems
BP
: base pairs
BSA
: Bovine Serum Albumin
BWA
: Burrows-Wheeler Alignment
CDS
: Coding sequences
COI
: cytochrome-c oxidase subunit 1
cpGenome
: Chloroplast genome
DNAmark
: Danish national DNA reference database
eDNA
: environmental DNA
GATK
: GenomeAnalysisTK
GBIF
: Global Biodiversity Information Facility
HTS
: high-throughput sequencing
ITS
: internal transcribed spacers
LCA
: lowest common ancestor
LSU rRNA
: large subunit ribosomal ribonucleic acid
matK
: maturase K
nrDNA
: nuclear ribosomal sequences
ORG.Annot
: Organelle Annotator
ORG.asm
: ORGanelle ASseMbler
PE
: paired-end
PVP
: Polyvinylpyrrolidone
qPCR
: quantitative PCR
rbcL
: ribulose-bisphosphate carboxylase large chain
SD
: standard deviation
sedaDNA
: Ancient sedimentary DNA
更多查看译文
关键词
dna reference data,genome skimming,extended barcode concept
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要