谷歌浏览器插件
订阅小程序
在清言上使用

Using networks to analyze and visualize the distribution of overlapping reading frames in virus genomes

biorxiv(2021)

引用 1|浏览1
暂无评分
摘要
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated reading frames in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in ds-DNA viruses. However, the longest overlaps involve no shift in reading frame (+0), increasing the selective burden of the same nucleotide positions within codons, instead of exposing additional sites to purifying selection. Next, we develop a new graph-based representation of the distribution of OvRFs among the reading frames of genomes in a given virus family. In the absence of an unambiguous partition of reading frames by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent reading frames are adjacent in one or more genomes, and (2) that the reading frames overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要