The haplotype-resolved chromosome pairs and transcriptome of a heterozygous diploid African cassava cultivar
biorxiv(2021)
Abstract
Background Cassava ( Manihot esculenta ) is an important clonally propagated food crop in tropical and sub-tropical regions worldwide. Genetic gain by molecular breeding is limited because cassava has a highly heterozygous, repetitive and difficult to assemble genome.
Findings Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present two chromosome scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. Genome comparisons revealed extensive chromosome re-arrangements and abundant intra-genomic and inter-genomic divergent sequences despite high gene synteny, with most large structural variations being LTR-retrotransposon related. Allele-specific expression analysis of different tissues based on the haplotype-resolved transcriptome identified both stable and inconsistent alleles with imbalanced expression patterns, while most alleles expressed coordinately. Among tissue-specific differentially expressed transcripts, coordinately and biasedly regulated transcripts were functionally enriched for different biological processes. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding.
Conclusions The haplotype-resolved genome allows the first systematic view of the heterozygous diploid genome organization in cassava. The completely phased and annotated chromosome pairs will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy and continuity.
### Competing Interest Statement
The authors have declared no competing interest.
* ### List of abbreviations
ASE
: allele-specific expression
BAC
: bacterial artificial chromosome
BP
: biological process
CCS
: circular consensus sequence
CDS
: coding sequence
CLR
: continuous long reads
CMD
: Cassava Mosaic Diseases
DE
: differentially expressed/differential expression
DET
: differentially expressed transcript
ENA
: European Nucleotide Archive
GO
: gene ontology
HiFi
: high-fidelity
HMW
: high molecular weight
Indel
: insertion and deletion
IPA
: improved Phased Assembler
MF
: molecular function
NCBI
: National Center for Biotechnology Information
numt’s
: nuclear mitochondrial pseudogene regions
PacBio
: Pacific Biosciences
PE
: paired-end
QV
: quality value
SMRT
: Single Molecule Real-Time
SNP
: single nucleotide polymorphism
SV
: structural variation
TPM
: transcript per million
UDI
: Unique Dual Indices
VGP
: the Vertebrate Genome Project
MoreTranslated text
Key words
chromosome pairs,transcriptome,diploid,haplotype-resolved
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined