Representation Learning of Biological Concepts: A Systematic Review

Yuntao Yang,Xu Zuo,Avisha Das,Hua Xu, Wenjin Zheng

CURRENT BIOINFORMATICS(2024)

引用 0|浏览2
暂无评分
摘要
Objective Representation learning in the context of biological concepts involves acquiring their numerical representations through various sources of biological information, such as sequences, interactions, and literature. This study has conducted a comprehensive systematic review by analyzing both quantitative and qualitative data to provide an overview of this field.Methods Our systematic review involved searching for articles on the representation learning of biological concepts in PubMed and EMBASE databases. Among the 507 articles published between 2015 and 2022, we carefully screened and selected 65 papers for inclusion. We then developed a structured workflow that involved identifying relevant biological concepts and data types, reviewing various representation learning techniques, and evaluating downstream applications for assessing the quality of the learned representations.Results The primary focus of this review was on the development of numerical representations for gene/DNA/RNA entities. We have found Word2Vec to be the most commonly used method for biological representation learning. Moreover, several studies are increasingly utilizing state-of-the-art large language models to learn numerical representations of biological concepts. We also observed that representations learned from specific sources were typically used for single downstream applications that were relevant to the source.Conclusion Existing methods for biological representation learning are primarily focused on learning representations from a single data type, with the output being fed into predictive models for downstream applications. Although there have been some studies that have explored the use of multiple data types to improve the performance of learned representations, such research is still relatively scarce. In this systematic review, we have provided a summary of the data types, models, and downstream applications used in this task.
更多
查看译文
关键词
Machine learning,biological concepts,representation learning,embedding,natural language processing,graph neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要