Active Sampling for Text Classification with Subinstance Level Queries.

AAAI Conference on Artificial Intelligence(2022)

引用 1|浏览50
暂无评分
摘要
Active learning algorithms are effective in identifying the salient and exemplar samples from large amounts of unlabeled data. This tremendously reduces the human annotation effort in inducing a machine learning model as only a few samples, which are identified by the algorithm, need to be labeled manually. In problem domains like text mining and video classification, human oracles peruse the data instances incrementally to derive an opinion about their class labels (such as reading a movie review progressively to assess its sentiment). In such applications, it is not necessary for the human oracles to review an unlabeled sample end-to-end in order to provide a label; it may be more efficient to identify an optimal subinstance size (percentage of the sample from the start) for each unlabeled sample, and request the human annotator to label the sample by analyzing only the subinstance, instead of the whole data sample. In this paper, we propose a novel framework to address this challenging problem, in an effort to further reduce the labeling burden on the human oracles and utilize the available labeling budget more efficiently. We pose the sample and subinstance size selection as a constrained optimization problem and derive a linear programming relaxation to select a batch of exemplar samples, together with the optimal subinstance size of each, which can potentially augment maximal information to the underlying classification model. Our extensive empirical studies on six challenging datasets from the text mining domain corroborate the practical usefulness of our framework over competing baselines.
更多
查看译文
关键词
Machine Learning (ML)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要