Improving Labeling Through Social Science Insights: Results and Research Agenda.

HCI (43)(2022)

引用 0|浏览4
暂无评分
摘要
Frequently, Machine Learning (ML) algorithms are trained on human-labeled data. Although often seen as a "gold standard," human labeling is all but error free. Decisions in the design of labeling tasks can lead to distortions of the resulting labeled data and impact predictions. Building on insights from survey methodology, a field that studies the impact of instrument design on survey data and estimates, we examine how the structure of a hate speech labeling task affects which labels are assigned. We also examine what effect task ordering has on the perception of hate speech and what role background characteristics of annotators have on classifications provided by annotators. The study demonstrates the importance of applying design thinking at the earliest steps of ML product development. Design principles such as quick prototyping and critically assessing user interfaces are not only important in interaction with end users of an artificial intelligence (AI)-driven products, but are crucial early in development, prior to training AI algorithms.
更多
查看译文
关键词
Data quality, Labels, Training data, Survey methodology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要