Objective: biochemical function.

Frontiers in genetics(2014)

引用 12|浏览11
暂无评分
摘要
DNA sequencing enables the discovery of new genes in high-throughput, low-cost experiments. Conversely, gene function is determined by low-throughput, high-cost experiments. This inverse relationship for these two types of data is a major impediment in meeting one of the major scientific challenges of our time—the understanding of genomes. This mismatch in throughput is illustrated by considering the progress made for one of the earliest sequenced genomes, that of Mycobacterium tuberculosis H37Rv (Mtb). When its genome was published in 1998, more than a quarter of its genes had no known function (Cole et al., 1998). Our lack of knowledge about these approximately 1000 “conserved hypothetical” genes in Mtb represents a serious deficiency in our understanding of its biology. Now, after more than a decade of progress, our knowledge of those proteins' functions is essentially unchanged—there are still greater than 900 genes with no known function (Lew et al., 2011). In contrast, during this same period, the scientific community has sequenced approximately 18,000 new genomes (Pagani et al., 2012), containing millions of new hypothetical proteins. Apparently, the vector of our progress has tipped decisively away from data interpretation and comprehension, and toward mere data collection. To address the issue of gene function testing and annotation for all microbes, we founded COMBREX (COMputational BRidge to EXperiments), an endeavor aimed at accelerating the rate of gene function validation (Anton et al., 2013). Two of COMBREX's more prominent initiatives were the creation of a comprehensive database for protein function data (http://combrex.bu.edu), and the deployment of a crowdsourcing platform to catalyze protein function experimentation. In the course of these two efforts, it became apparent that fundamental changes in approaches to the problem of protein function determination were needed if there was any hope of keeping pace with DNA sequencing. We suggest that the community work together to (1) re-establish the connection between existing gene annotation and the foundational experimental data that supports all annotation, (2) develop experiment design principles to help guide the identification of maximally informative targets for function validation, (3) invest in the development of higher-throughput approaches for the testing of protein function, and (4) provide an expedited publication pathway for reporting experimental results of gene function, analogous to the reporting of newly sequenced genomes in the journal “Standards in Genomic Sciences.”
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要