Investigating Active Learning Sampling Strategies for Extreme Multi Label Text Classification.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览4
暂无评分
摘要
Large scale, multi-label text datasets with a high number of classes are expensive to annotate, even more so if they belong to specific language domains. In this work, we aim to build classifiers for these datasets using Active Learning in order to reduce the labeling effort. We outline the challenges when dealing with extreme multi-label settings and show the limitations of existing pool-based Active Learning strategies by considering their effectiveness as well as efficiency in terms of computational cost. In addition, we present five multi-label datasets which were compiled from hierarchical classification tasks to serve as benchmarks in the context of extreme multi-label classification for future experiments. Finally, we provide insights into multi-class, multi-label evaluation and present an improved classifier architecture on top of pre-trained transformer language models.
更多
查看译文
关键词
Active Learning, Text Classification, Multi-Label
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要