TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering.

ICDM (Workshops)(2022)

Cited 0|Views9
No score
Abstract
Novel technologies in automated machine learning ease the complexity of building well-performed machine learning pipelines. However, these are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning, particularly clustering, remains a largely unexplored problem due to the ambiguity involved when evaluating the clustering solutions. Motivated by this shortcoming, in this paper, we introduce TPE-AutoClust, a genetic programming-based automated machine learning framework for clustering. TPE-AutoClust optimizes a series of feature preprocessors and machine learning models to optimize the performance on an unsupervised clustering task. TPE-AutoClust mainly consists of three main phases: meta-learning phase, optimization phase and clustering ensemble construction phase. The meta-learning phase suggests some instantiations of pipelines that are likely to perform well on a new dataset. These pipelines are used to warmstart the optimization phase that adopts a multiobjective optimization technique to select pipelines based on the Pareto front of the trade-off between the pipeline length and performance. The ensemble construction phase develops a collaborative mechanism based on a clustering ensemble to combine optimized pipelines based on different internal cluster validity indices and construct a well-performing solution for a new dataset. The proposed framework is based on scikit-learn with 4 preprocessors and 6 clustering algorithms. Extensive experiments are conducted on 27 real and synthetic benchmark datasets to validate the superiority of TPE-AutoClust. The results show that TPE-AutoClust outperforms the state-of-the-art techniques for building automated clustering solutions.
More
Translated text
Key words
clustering, meta-learning, hyperparameter optimization, evolutionary algorithms
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined