Clustering method for the construction of machine learning model with high predictive ability

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS(2024)

引用 0|浏览0
暂无评分
摘要
In the design of molecules, materials, and processes, a mathematical model y = f(x) is constructed to establish a relationship between the explanatory variable x and the objective variable y using a dataset to design x such that y achieves target values. While it is preferable to develop a model with high predictive ability, achieving this becomes unattainable when the relationship between x and y is inconsistent. In such instances, it becomes necessary to incorporate additional factors to elucidate the relationship between x and y. However, when this proves challenging, this study proposes a clustering method wherein the relationship between x and y remains consistent within each cluster. Utilizing a genetic algorithm, this method selects samples for which the relationship between x and y remains consistent, and establishes clusters of samples sharing a consistent relationship between x and y using these selected samples as the core. Through comparison with other clustering methods across six datasets, including molecules, materials, and spectra, it was confirmed that the proposed method can effectively form clusters demonstrating a high predictive ability for each cluster. Specifically, the method successfully identifies groups of samples exhibiting a consistent relationship between x and y.
更多
查看译文
关键词
Machine learning,Clustering,High predictive ability,Genetic algorithm,Material design,Molecular design
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要