It Runs in the Family: Unsupervised Algorithm for Alternative Name Suggestion Using Digitized Family Trees

IEEE Transactions on Knowledge and Data Engineering(2023)

引用 0|浏览15
暂无评分
摘要
Searching for a person’s name is a common online activity. However, Web search engines provide few accurate results to queries containing names. In contrast to a general word that has only one correct spelling, there are several possible legitimate spellings when a name provided as a query. Today, most techniques used to suggest diminutives and alternative spellings in online search are based on pattern matching and phonetic encoding; however, they often perform poorly. As a result, there is a need for an effective tool for improved alternative name suggestion for a name provided as a query. In this paper, we propose a revolutionary approach for tackling the problem of alternative name suggestion. Our novel algorithm, GRAFT , utilizes historical data collected from genealogy websites, along with network algorithms. GRAFT is a general algorithm that suggests alternatives for input names using a graph based on names derived from digitized ancestral family trees. Alternative names are extracted from this graph, which is constructed using generic ordering functions that outperform other algorithms that suggest diminutives and alternative spellings based on a single dimension, a factor that limits their performance. We evaluated GRAFT ’s performance on three ground truth datasets of forenames and surnames, including a large-scale online genealogy dataset with over 16 million profiles and more than 700,000 unique forenames and 500,000 surnames. We compared GRAFT ’s performance at suggesting alternative names to the performance of 10 other algorithms, including phonetic encoding, string similarity, machine learning, and deep learning algorithms. The results show GRAFT ’s superiority with regard to both forenames and surnames and demonstrate its use as a tool to improve alternative name suggestion.
更多
查看译文
关键词
Alternative name suggestion,digitized family trees,networks,network science,personal names,name-based graphs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要