Chrome Extension
WeChat Mini Program
Use on ChatGLM

Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification

Journal of Big Data(2019)

Cited 34|Views12
No score
Abstract
The goal of this paper is to reduce the classification (inference) complexity of tree ensembles by choosing a single representative model out of ensemble of multiple decision-tree models. We compute the similarity between different models in the ensemble and choose the model, which is most similar to others as the best representative of the entire dataset. The similarity-based approach is implemented with three different similarity metrics: a syntactic, a semantic, and a linear combination of the two. We compare this tree selection methodology to a popular ensemble algorithm (majority voting) and to the baseline of randomly choosing one of the local models. In addition, we evaluate two alternative tree selection strategies: choosing the tree having the highest validation accuracy and reducing the original ensemble to five most representative trees. The comparative evaluation experiments are performed on six big datasets using two popular decision-tree algorithms (J48 and CART) and splitting each dataset horizontally into six different amounts of equal-size slices (from 32 to 1024). In most experiments, the syntactic similarity approach, named SySM—Syntactic Similarity Method, provides a significantly higher testing accuracy than the semantic and the combined ones. The mean accuracy of SySM over all datasets is 0.835±0.065 for CART and 0.769±0.066 for J48. On the other hand, we find no statistically significant difference between the testing accuracy of the trees selected by SySM and the trees having the highest validation accuracy. Comparing to ensemble algorithms, the representative models selected by the proposed methods provide a higher speed for big data classification along with being more compact and interpretable.
More
Translated text
Key words
Big data, Ensemble learning, Lazy ensemble evaluation, Decision trees, Editing distance, Tree similarity
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined