Efficient and Scalable Fine-Tune of Language Models for Genome Understanding
CoRR(2024)
摘要
Although DNA foundation models have advanced the understanding of genomes,
they still face significant challenges in the limited scale and diversity of
genomic data. This limitation starkly contrasts with the success of natural
language foundation models, which thrive on substantially larger scales.
Furthermore, genome understanding involves numerous downstream genome
annotation tasks with inherent data heterogeneity, thereby necessitating more
efficient and robust fine-tuning methods tailored for genomics. Here, we
present Lingo: Language prefix fIne-tuning for
GenOmes. Unlike DNA foundation models, Lingo
strategically leverages natural language foundation models' contextual cues,
recalibrating their linguistic knowledge to genomic sequences. Lingo
further accommodates numerous, heterogeneous downstream fine-tune tasks by an
adaptive rank sampling method that prunes and stochastically reintroduces
pruned singular vectors within small computational budgets. Adaptive rank
sampling outperformed existing fine-tuning methods on all benchmarked 14 genome
understanding tasks, while requiring fewer than 2% of trainable parameters as
genomic-specific adapters. Impressively, applying these adapters on natural
language foundation models matched or even exceeded the performance of DNA
foundation models. Lingo presents a new paradigm of efficient and
scalable genome understanding via genomic-specific adapters on language models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要