Learning Multiclassifiers with Predictive Features that Vary with Data Distribution

2018 IEEE International Conference on Big Data (Big Data)(2018)

引用 1|浏览39
暂无评分
摘要
In many real-world big data applications, the data distribution is not homogeneous over entire data, but instead varies across groups/clusters of data samples. Although a model's predictive performance remains vital, there is also a need to learn succinct sets of features that evolve and capture smooth variations in data distribution. These small sets of features not only lead to high prediction accuracy, but also discover the important underlying processes. We investigate this challenging problem by developing a novel multi-task learning paradigm that trains multiple support vector machine (SVM) classifiers over a set of related data clusters, and directly imposes smoothness constraints on adjacent classifiers. We show that such patterns can be effectively learned in the dual form of the classical SVM, and further show that a parsimonious solution can be achieved in the primal form. Although a solution can be effectively optimized via gradient descent, the technical development is not straightforward, requiring a relaxation over the loss function of SVMs. We demonstrate the performance of our algorithm in two practical application domains: team performance and road traffic prediction. Empirical results show our model not only achieves competitive prediction accuracy, but its discovered patterns truly capture and give intuition about the variation in the data distribution across multiple data clusters.
更多
查看译文
关键词
Big Data,Feature Extraction,Supervised Learning,Multi-task learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要