EL_LSTM: Prediction of DNA-binding residue from Protein sequence by Combining Long Short-Term Memory and Ensemble Learning.

IEEE/ACM transactions on computational biology and bioinformatics(2020)

引用 23|浏览71
暂无评分
摘要
Most past works for DNA-binding residue prediction did not consider the relationships between residues. In this paper, we propose a novel approach for DNA-binding residue prediction, referred to as EL_LSTM, which includes two main components. The first component is the Long Short-Term Memory (LSTM), which learns pairwise relationships between residues through a bi-gram model and then learns feature vectors for all residues. The second component is an ensemble learning based classifier introduced to tackle the data imbalance problem in binding residue predictions. We use a variant of the bagging strategy in ensemble learning to achieve balanced samples. Evaluations on PDNA-224 and DBP-123 show that adding feature relationships performs better than classifiers without feature relationships by at least 0.028 on MCC, 1.18 percent on ST and 0.012 on AUC. This indicates the usefulness of feature relationships for DNA-binding residue predictions. Evaluation on using ensemble learning indicates that the improvement can reach at least 0.021 on MCC, 1.32 percent on ST, and 0.018 on AUC compared to the use of a single LSTM classifier. Comparisons with the state-of-the-art predictors show that our proposed EL_LSTM outperforms them significantly. Further feature analysis validates the effectiveness of LSTM for the prediction of DNA-binding residues.
更多
查看译文
关键词
Feature extraction,Protein sequence,Neural networks,DNA,Support vector machines,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要