Chinese address standardisation via hybrid approach combining statistical and rule-based methods.

International Journal of Internet and Enterprise Management(2019)

Cited 0|Views12
No score
Abstract
This paper is derived from the research project of cleansing customer address data for the State Grid Corporation of China (SGCC), which is the largest electric utility company in the world and was ranked the 2nd in the 2016 Fortune Global 500. Address standardisation involves development of a standard address format for data integration, de-duplication, auto address correction/completion, and is widely considered as a very challenging data cleansing task. Address standardisation is critical for routine business tasks, customer relationship management, business intelligence for customer-oriented cooperates, and others. Address standardisation is particularly difficult for the Chinese language. The underlying reasons include: 1) the current address standard placed in China is only realised at the city/town level; 2) due to a number of reasons, many hand-written addresses are incomplete or contain errors; 3) it is difficult to process the Chinese language in a machine fashion due to the language. characteristics. To tackle challenges, we propose a hybrid approach combining both statistical and rule-based methods, which are the two mainstream address standardisation approaches. Our hybrid approach utilises the merits of the both methods and can complete the address standardisation task with a little human efforts and computational time, while achieving high accuracy.
More
Translated text
Key words
chinese address standardisation,rule-based
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined