Learning from Noisy Pairwise Similarity and Unlabeled Data.

Songhua Wu,Tongliang Liu,Bo Han,Jun Yu,Gang Niu,Masashi Sugiyama

J. Mach. Learn. Res.（2022）

引用 0|浏览5

暂无评分

摘要

SU classification employs similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points to build a classifier, which can serve as an alternative to the standard supervised trained classifiers requiring data points with class labels. SU classification is advanta-geous because in the era of big data, more attention has been paid to data privacy. Datasets with specific class labels are often difficult to obtain in real-world classification applications regarding privacy-sensitive matters, such as politics and religion, which can be a bottleneck in supervised classification. Fortunately, similarity labels do not reveal the explicit information and inherently protect the privacy, e.g., collecting answers to "With whom do you share the same opinion on issue I?" instead of "What is your opinion on issue I?". Nevertheless, SU classification still has an obvious limitation: respondents might answer these questions in a manner that is viewed favorably by others instead of answering truthfully. Therefore, there exist some dissimilar data pairs labeled as similar, which significantly degenerates the performance of SU classification. In this paper, we study how to learn from noisy similar (nS) data pairs and unlabeled (U) data, which is called nSU classieurocation. Specifically, we carefully model the similarity noise and estimate the noise rate by using the mixture proportion estimation technique. Then, a clean classifier can be learned by minimizing a denoised and unbiased classification risk estimator, which only involves the noisy data. Moreover, we further derive a theoretical generalization error bound for the proposed method. Experimental results demonstrate the effectiveness of the proposed algorithm on several benchmark datasets.

查看译文

关键词

privacy concern, similarity learning, unbiased classifier

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要