谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Prediction of enzymatic function with high efficiency and a reduced number of features using genetic algorithm.

Diogo R Reis, Bruno C Santos,Lucas Bleicher,Luis E Zárate,Cristiane N Nobre

Computers in biology and medicine(2023)

引用 2|浏览7
暂无评分
摘要
The post-genomic era has raised a growing demand for efficient procedures to identify protein functions, which can be accomplished by applying machine learning to the characteristics set extracted from the protein. This approach is feature-based and has been the focus of several works in bioinformatics. In this work, we investigated the characteristics of proteins, representing the primary, secondary, tertiary, and quaternary structures of the protein, that improve the model's quality by applying dimensionality reduction techniques and using the Support Vector Machine classifier for predicting the enzymes' classes. During the investigation, two approaches were evaluated: feature extraction/transformation, which was performed using the statistical technique Factor Analysis, and feature selection methods. For feature selection, we proposed an approach based on a genetic algorithm to face the optimization conflict between the simplicity and reliability of an ideal representation of the characteristics of the enzymes and also compared and employed other methods for this purpose. The best result was accomplished using a feature subset generated by our implementation of a multi-objective genetic algorithm enriched with features that this work identified as relevant to represent the enzymes. This subset representation reduced the dataset by about 87% and reached 85.78% of F-measure performance, improving the overall quality of the model classification. In addition, we verified in this work a subset addressed with only 28 features out of a total of 424 that reached a performance above 80% of F-measure for four of the six evaluated classes, showing that satisfactory classification performance can be achieved with a reduced number of enzymes's characteristics. The datasets and implementations are openly available.
更多
查看译文
关键词
Bioinformatics,Dimensionality reduction,Enzyme class prediction,Feature selection,Multi-objective genetic algorithm,Proteins,Support vector machine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要