Distributed tuning of machine learning algorithms using MapReduce Clusters

KDD(2011)

引用 43|浏览3
暂无评分
摘要
ABSTRACTObtaining the best accuracy in machine learning usually requires carefully tuning learning algorithm parameters for each problem. Parameter optimization is computationally challenging for learning methods with many hyperparameters. In this paper we show that MapReduce Clusters are particularly well suited for parallel parameter optimization. We use MapReduce to optimize regularization parameters for boosted trees and random forests on several text problems: three retrieval ranking problems and a Wikipedia vandalism problem. We show how model accuracy improves as a function of the percent of parameter space explored, that accuracy can be hurt by exploring parameter space too aggressively, and that there can be significant interaction between parameters that appear to be independent. Our results suggest that MapReduce is a two-edged sword: it makes parameter optimization feasible on a massive scale that would have been unimaginable just a few years ago, but also creates a new opportunity for overfitting that can reduce accuracy and lead to inferior learning parameters.
更多
查看译文
关键词
mapreduce clusters,wikipedia vandalism problem,parameter optimization,algorithm parameter,best accuracy,regularization parameter,parallel parameter optimization,inferior learning parameter,model accuracy,parameter space,hyper parameter,tuning,machine learning,optimization,random forest
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要