Soft Set Based Clustering and Its Comparison on Categorical Data

Iwan Tri Riyadi Yanto, Cheah WaiShiang,Rahmat Hidayat, Rofiul Wahyudi, Suprihatin,Ani Apriani

2023 IEEE 9th Information Technology International Seminar (ITIS)(2023)

引用 0|浏览0
Categorical data clustering is problematic since it is difficult or complex to determine how comparable the data is. Several methods, most recently centroid-based strategies, have been developed to reduce the complexity of the similarity of categorical data. These methods nevertheless result in lengthy processing durations. Another method, soft set-based clustering (SSC), based on the probability function of multivariate multinomial distributions, is suggested in this article. Soft sets are used to represent the data, and each soft set has a probability for each object. The joint cluster distribution function determines the probability for each object after the multivariate multinomial distribution function. The connected cluster would receive the highest likelihood. Benchmark data sets from UCI machine learning are used to compare the performance of the approach to the baseline techniques. The outcomes demonstrate that the suggested strategy performed better in purity, rank index, and calculation time.
Soft set,categorical data,multinomial distribution
AI 理解论文
Chat Paper