GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models
CoRR(2024)
Abstract
The Genomic Foundation Model (GFM) paradigm is expected to facilitate the
extraction of generalizable representations from massive genomic data, thereby
enabling their application across a spectrum of downstream applications.
Despite advancements, a lack of evaluation framework makes it difficult to
ensure equitable assessment due to experimental settings, model intricacy,
benchmark datasets, and reproducibility challenges. In the absence of
standardization, comparative analyses risk becoming biased and unreliable. To
surmount this impasse, we introduce GenBench, a comprehensive benchmarking
suite specifically tailored for evaluating the efficacy of Genomic Foundation
Models. GenBench offers a modular and expandable framework that encapsulates a
variety of state-of-the-art methodologies. Through systematic evaluations of
datasets spanning diverse biological domains with a particular emphasis on both
short-range and long-range genomic tasks, firstly including the three most
important DNA tasks covering Coding Region, Non-Coding Region, Genome
Structure, etc. Moreover, We provide a nuanced analysis of the interplay
between model architecture and dataset characteristics on task-specific
performance. Our findings reveal an interesting observation: independent of the
number of parameters, the discernible difference in preference between the
attention-based and convolution-based models on short- and long-range tasks may
provide insights into the future design of GFM.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined