Optimizing Clustering Algorithms for Anti-Microbial Evaluation Data: A Majority Score-Based Evaluation of K-Means, Gaussian Mixture Model, and Multivariate T-Distribution Mixtures.

Hira Mahmood,Tahir Mehmood, Laila A. Al-Essa

IEEE Access(2023)

引用 1|浏览0
暂无评分
摘要
This study presents a detailed analysis of the performance of the majority score clustering algorithm on three different datasets of anti-microbial evaluation, namely the minimum inhibitory concentration (MIC) of bacteria, and the antifungal activity of chemical compounds against 4 bacteria (E. coli, P. aeruginosa, S. aureus, S. pyogenes) and 2 fungi (C. albicans, As. fumigatus). Clustering is an unsupervised machine learning method used to group chemical compounds based on their similarity. In this paper, we apply the k-means clustering, Gaussian mixture model (GMM), and mixtures of multivariate t distribution to antibacterial activity datasets. To determine the optimal number of clusters and which clustering algorithm performs best, we use a variety of clustering validation indices (CVIs) which include within sum square (to be minimized), connectivity (to be minimized), Silhouette Width (to be maximized), and the Dunn Index (to be maximized). Based on the majority score clustering algorithm, we conclude that the k-means and mixture of multivariate t-distribution methods perform best in terms of the maximum CVIs, while GMM performs best in terms of the minimum CVIs. K-means clustering and mixture of multivariate t-distribution provide 3 optimal clusters for the anti-microbial evaluation of antibacterial activity dataset and 5 optimal clusters for the MIC bacteria dataset. K-means clustering, mixture of multivariate t-distribution, and GMM provide 3 optimal clusters for both the antibacterial and antifungal activity datasets. K-means clustering algorithm performs the best in terms of the majority-based clustering algorithm. This study may be useful for the pharmaceutical industry, chemists, and medical professionals in the future.
更多
查看译文
关键词
Clustering,K-means,GMM,multivariate t distribution,Silhouette width,within sum square,Dunn index
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要