A latent variable model for geographic lexical variation

EMNLP(2010)

引用 907|浏览487
暂无评分
摘要
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as "sports" or "entertainment" are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author's geographic location from raw text, outperforming both text regression and supervised topic models.
更多
查看译文
关键词
multi-level generative model,geographic region,new computational possibility,geographic area,latent variable model,geotagged microblogs,geographic linguistic variation,geographic lexical variation,supervised topic model,linguistic consistency,geotagged social media,geographic location,social media
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要