Parallel Large Scale Feature Selection for Logistic Regression

SDM(2009)

引用 64|浏览136
暂无评分
摘要
In this paper we examine the problem of ecient feature evaluation for logistic regression on very large data sets. We present a new forward feature selection heuristic that ranks features by their estimated eect on the resulting model's performance. An approximate optimization, based on backtting, provides a fast and accurate estimate of each new feature's coecient in the logistic regression model. Further, the algorithm is highly scalable by parallelizing simultaneously over both features and records, allowing us to quickly evaluate billions of potential features even for very large data sets.
更多
查看译文
关键词
feature selection,logistic regression model,logistic regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要