The Affects of Demographics Differentiations on Authorship Identification
msra(2010)
摘要
There is lots of previous studies concern the language difference in text regarding the demographics attribute. This investigation
is different by presenting a new question: is male style more consistent than female or the opposite? Furthermore, we study
the style differentiation according to age. Hence, this investigation presents a novel analysis of the proposed problem by
applying authorship identification across each category and comparing the identification accuracy between them. We select
personal blogs or diaries, which are different from other types of text such as essays, emails, or articles based on the text
properties. The investigation utilizes couple of intuitive feature sets and studies various parameters that affect the identification
performance. The results and evaluation show that the utilized features are compact while their performance is highly comparable
with other larger feature sets. The analysis also confirmed the usefulness of the common users’ classifier, based on common
demographics attributes, in improving the performance for the author identification task.
更多查看译文
关键词
web mininginformation extractionpsycholinguisticmachine learningauthorship identificationdemographics differentiation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要