Large-scale gene co-expression network as a source of functional annotation for cattle genes

BMC Genomics(2016)

引用 16|浏览18
暂无评分
摘要
Background Genome sequencing and subsequent gene annotation of genomes has led to the elucidation of many genes, but in vertebrates the actual number of protein coding genes are very consistent across species (~20,000). Seven years after sequencing the cattle genome, there are still genes that have limited annotation and the function of many genes are still not understood, or partly understood at best. Based on the assumption that genes with similar patterns of expression across a vast array of tissues and experimental conditions are likely to encode proteins with related functions or participate within a given pathway, we constructed a genome-wide Cattle Gene Co-expression Network (CGCN) using 72 microarray datasets that contained a total of 1470 Affymetrix Genechip Bovine Genome Arrays that were retrieved from either NCBI GEO or EBI ArrayExpress. Results The total of 16,607 probe sets, which represented 11,397 genes, with unique Entrez ID were consolidated into 32 co-expression modules that contained between 29 and 2569 probe sets. All of the identified modules showed strong functional enrichment for gene ontology (GO) terms and Reactome pathways. For example, modules with important biological functions such as response to virus, response to bacteria, energy metabolism, cell signaling and cell cycle have been identified. Moreover, gene co-expression networks using “guilt-by-association” principle have been used to predict the potential function of 132 genes with no functional annotation. Four unknown Hub genes were identified in modules highly enriched for GO terms related to leukocyte activation ( LOC509513 ), RNA processing ( LOC100848208 ), nucleic acid metabolic process ( LOC100850151 ) and organic-acid metabolic process ( MGC137211 ). Such highly connected genes should be investigated more closely as they likely to have key regulatory roles. Conclusions We have demonstrated that the CGCN and its corresponding regulons provides rich information for experimental biologists to design experiments, interpret experimental results, and develop novel hypothesis on gene function in this poorly annotated genome. The network is publicly accessible at http://www.animalgenome.org/cgi-bin/host/reecylab/d .
更多
查看译文
关键词
Gene Ontology, Weighted Adjacency Matrix, Chromosomal Passenger Complex, Cattle Genome, White Module
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要