Age and gender in language, emoji, and emoticon usage in instant messages

COMPUTERS IN HUMAN BEHAVIOR(2022)

引用 21|浏览37
暂无评分
摘要
Text is one of the most prevalent types of digital data that people create as they go about their lives. Digital footprints of people's language usage in social media posts were found to allow for inferences of their age and gender. However, the even more prevalent and potentially more sensitive text from instant messaging services has remained largely uninvestigated. We analyze language variations in instant messages with regard to indi-vidual differences in age and gender by replicating and extending the methods used in prior research on social media posts. Using a dataset of 309,229 WhatsApp messages from 226 volunteers, we identify unique age-and gender-linked language variations. We use cross-validated machine learning algorithms to predict volunteers' age (MAE(Md) = 3.95, r(Md) = 0.81, R-Md(2) = 0.49) and gender (Accuracy(Md) = 85.7%, F1(Md) = 0.67, AU(CMd) = .82) significantly above baseline-levels and identify the most predictive language features. We discuss implications for psycholinguistic theory, present opportunities for application in author profiling, and suggest methodological approaches for making predictions from small text data sets. Given the recent trend towards the dominant use of private messaging and increasingly weaker user data protection, we highlight rising threats to individual privacy rights in instant messaging.
更多
查看译文
关键词
Age, Gender, Author profiling, Instant messages, Machine learning, Digital footprints
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要