Assessing the generalization abilities of machine-learning scoring functions for structure-based virtual screening

crossref(2022)

引用 0|浏览2
暂无评分
摘要
In structure-based virtual screening (SBVS), it is critical for machine-learning scoring functions (MLSFs) to capture protein-ligand atomic interaction patterns. We generated a cross-target generalization ability benchmark for protein-ligand binding affinity prediction to assess whether MLSFs could capture these interactions. By focusing on the local domains in protein-ligand binding pockets, we developed standardized pocket Pfam-based clustering (Pfam-cluster) approach for the generalization ability benchmark. Subsequently, 11 typical MLSFs were tested using random cross-validation (Random-CV), protein sequence similarity-based cross-validation (Seq-CV), and pocket Pfam-based cross-validation (Pfam-CV) methods. Surprisingly, all of the tested models showed decreased performance as they were evaluated from Random-CV to Seq-CV to Pfam-CV experiments, without showing satisfactory generalization capacity. Interpretable analysis revealed that predictions on novel targets by MLSFs were relying on buried solvent accessible surface area (SASA)-related features in complex structures. By combining buried SASA-related information with ligand-specific patterns that were only shared among structurally similar compounds, higher performance in Random-CV tests was attained for Random forest (RF)-Score. Based on these findings, we strongly advise assessing the generalization ability of MLSFs with the Pfam-cluster approach and being cautious with the features learned by MLSFs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要