An Euclidean Distance based on the Weighted Self-information Related Data Transformation for Nominal Data Clustering.

CIKM(2017)

引用 2|浏览8
暂无评分
摘要
Numerical data clustering is a tractable task since well-defined numerical measures like traditional Euclidean distance can be directly used for it, but nominal data clustering is a very difficult problem because there exists no natural relative ordering between nominal attribute values. This paper mainly aims to make the Euclidean distance measure appropriate to nominal data clustering, and the core idea is to transform each nominal attribute value into numerical. This transformation method consists of three steps. In the first step, the weighted self-information, which can quantify the amount of information in attribute values, is calculated for each value in each nominal attribute. In the second step, we find k nearest neighbors for each object because k nearest neighbors of one object have close similarities with it. In the last step, the weighted self-information of each attribute value in each nominal object is modified according to the object's k nearest neighbors. To evaluate the effectiveness of our proposed method, experiments are done on 10 data sets. Experimental results demonstrate that our method not only enables the Euclidean distance to be used for nominal data clustering, but also can acquire the better clustering performance than several existing state-of-the-art approaches.
更多
查看译文
关键词
nominal data clustering, Euclidean distance, self-information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要