W-TextCNN: A TextCNN model with weighted word embeddings for Chinese address pattern classification

Computers, Environment and Urban Systems(2022)

引用 6|浏览45
暂无评分
摘要
Geocoding is crucial to support location-based services and has become a widely accessible technique in geographic information systems (GIS). In a geocoding system, addresses are one of the main geographical reference texts as input. Address patterns refer to the organizational rules of combining address components into an address. In China, intricate rules and backwards address planning make address patterns not systematic and difficult to recognize, which creates significant challenges for database construction and address standardization. Inspired by deep learning methods, this paper provides a convolutional neural network for text with weighted word embeddings (W-TextCNN) for Chinese address pattern classification. Specifically, we define eight address patterns to represent the structures of addresses considering the characteristics of address components. For calculating addresses in the neural network, word embeddings with a weighted strategy are implemented for transforming address texts into real-valued vectors. The vectors are fed into a convolutional neural network for text (TextCNN) to train for classifying address patterns automatically. Furthermore, we apply W-TextCNN in the address corpus after fine-tuning the hyperparameters and compare it with several methods commonly used in text classification. We also design two tasks address segmentation and address matching to explore the effect of address pattern classification. The accuracy and F1 score of the model on classification achieve 97.45% and 96%, respectively, and W-TextCNN outperforms TextCNN because of the employment of the weighted word embeddings. Additionally, the results reveal the positive impact of address pattern classification on improving segmentation precision and address quality. The proposed model is expected to expand the toolkit of computational address study with deep learning methods.
更多
查看译文
关键词
Address patterns,Address components,Address structure,Geocoding,Weighted word embeddings,Convolutional neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要