Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Harit Vishwakarma, Reid, Chen, Sui Jiet Tay, Satya Sai Srinath Namburi,Frederic Sala,Ramya Korlakai Vinayak

arxiv(2024)

引用 0|浏览2
暂无评分
摘要
Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the optimal TBAL confidence function. We develop a tractable version of the framework to obtain (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of our method and compare it against methods designed for calibration. achieves up to 60% improvements on coverage over the baselines while maintaining auto-labeling error below 5% and using the same amount of labeled data as the baselines.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要