Generalizability Improvement of Interpretable Symbolic Regression Models for Quantitative Structure-Activity Relationships

ACS OMEGA(2024)

引用 0|浏览0
暂无评分
摘要
In the pursuit of optimal quantitative structure-activity relationship (QSAR) models, two key factors are paramount: the robustness of predictive ability and the interpretability of the model. Symbolic regression (SR) searches for the mathematical expressions that explain a training data set. Thus, the models provided by SR are globally interpretable. We previously proposed an SR method that can generate interpretable expressions by humans. This study introduces an enhanced symbolic regression method, termed filter-induced genetic programming 2 (FIGP2), as an extension of our previously proposed SR method. FIGP2 is designed to improve the generalizability of SR models and to be applicable to data sets in which cost-intensive descriptors are employed. The FIGP2 method incorporates two major improvements: a modified domain filter to eradicate diverging expressions based on optimal calculation and the introduction of a stability metric to penalize expressions that would lead to overfitting. Our retrospective comparative analysis using 12 structure-activity relationship data sets revealed that FIGP2 surpassed the previously proposed SR method and conventional modeling methods, such as support vector regression and multivariate linear regression in terms of predictive performance. Generated mathematical expressions by FIGP2 were relatively simple and not divergent in the domain of function. Taken together, FIGP2 can be used for making interpretable regression models with predictive ability.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要