Detecting languages in streetscapes using deep convolutional neural networks.

You Xuan Thung,Tom Benson,Nikita Klimenko

Big Data(2022)

引用 1|浏览4
暂无评分
摘要
Cities are laden with visual clues—public road signs, advertising billboards, street names, place names, street art, and commercial shop signs. In urban studies literature, it is common to use such clues, littered along streetscapes, to understand cities better. Sociologists have tried to derive meaning from the linguistic diversity in streetscapes, but their efforts are limited by the laborious effort of field s tudies. Taking a dvantage of the availability of street view imagery (SVI) today and the advent of deep learning driven by the availability of big data, we undertake a machine learning approach to quantify the linguistic diversity in cities. In this paper, we introduce a language detection tool to identify the presence of English, Swedish, Arabic and Chinese in streetscapes. The tool is built on a pretrained DenseNet-121 model and trained with both synthetic images, and real streetscapes scraped from Google Street View (GSV). We achieve a test accuracy of 80.8% across all four languages, surpassing the performance of best performing optical character recognition (OCR) tools by 9 percentage points. The F1 score of 79.8% also surpasses that of present OCR tools by 11 percentage points, which suggests that our model is both accurate and reliable. We use the gradient-weighted class activation map (Grad-CAM) to show that our model is visually interpretable. We then apply the tool to Stockholm, Sweden to create a portrait of the linguistic diversity of the city and find a chasm between linguistic diversity and population diversity, which points to the importance of measuring the two measures separately.
更多
查看译文
关键词
deep learning,computer vision,language detection,convolutional neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要