Binarised regression tasks: methods and evaluation metrics

Data Min. Knowl. Discov.(2015)

引用 6|浏览111
暂无评分
摘要
Some supervised tasks are presented with a numerical output but decisions have to be made in a discrete, binarised, way, according to a particular cutoff. This binarised regression task is a very common situation that requires its own analysis, different from regression and classification—and ordinal regression. We first investigate the application cases in terms of the information about the distribution and range of the cutoffs and distinguish six possible scenarios, some of which are more common than others. Next, we study two basic approaches: the retraining approach, which discretises the training set whenever the cutoff is available and learns a new classifier from it, and the reframing approach, which learns a regression model and sets the cutoff when this is available during deployment. In order to assess the binarised regression task, we introduce context plots featuring error against cutoff. Two special cases are of interest, the UCE and OCE curves, showing that the area under the former is the mean absolute error and the latter is a new metric that is in between a ranking measure and a residual-based measure. A comprehensive evaluation of the retraining and reframing approaches is performed using a repository of binarised regression problems created on purpose, concluding that no method is clearly better than the other, except when the size of the training data is small.
更多
查看译文
关键词
Regression,Classification,Reframing,Mean absolute error,Cutoff,Binarisation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要