A new validity index for fuzzy-possibilistic c-means clustering

Document Type : Article

Authors

1 Department of Industrial Engineering and Management Systems,Amirkabir University of Technology,Tehran,Iran

2 Tijuana Institutes of Technology,Tijuana,Mexico

Abstract

In some complicated datasets, due to the presence of noisy data points and outliers, cluster validity indices can give conflicting results in determining the optimal number of clusters. This paper presents a new validity index for fuzzy-possibilistic c-means clustering called Fuzzy-Possibilistic)FP (index, which works well in the presence of clusters that vary in shape and density. Moreover, FPCM like most of the clustering algorithms is susceptible to some initial parameters. In this regard, in addition to the number of clusters, FPCM requires a priori selection of the degree of fuzziness (m) and the degree of typicality (η). Therefore, we presented an efficient procedure for determining an optimal value for and . The proposed approach has been evaluated using several synthetic and real-world datasets. Final computational results demonstrate the capabilities and reliability of the proposed approach compared with several well-known fuzzy validity indices in the literature. Furthermore, to clarify the ability of the proposed method in real applications, the proposed method is implemented in microarray gene expression data clustering and medical image segmentation.

Keywords


References:
[1] Duda, R. O., & Hart, P. E. “Pattern classification and scene analysis”  (Vol. 3). New York: Wiley (1973).
[2] Gallegos, M. T., & Ritter, G. “Probabilistic clustering via Pareto solutions and significance tests. Advances in Data Analysis and Classification”, 12(2), pp. 179-202 (2018).  
[3] Bezdek, J. C. “Pattern recognition with fuzzy objective function algorithms”, Springer Science & Business Media (2013).
[4] Krishnapuram, R., & Keller, J. M. “A possibilistic approach to clustering”, IEEE transactions on fuzzy systems, 1(2), pp. 98-110 (1993).
[5] Mendel, J. M. “Type-2 fuzzy sets. In Uncertain Rule-Based Fuzzy Systems”, pp. 259-306, Springer, Cham (2017).
[6] Sotudian, S., Zarandi, M.F. and Turksen, I.B.” From Type-I to Type-II fuzzy system modeling for diagnosis of hepatitis”, World Acad. Sci. Eng. Technol. Int. J. Comput. Electr. Autom. Control Inf. Eng, 10(7), pp.1238-1246 (2016).
 [7] Haldar, N. A. H., Khan, F. A., Ali, A. et al. “Arrhythmia classification using Mahalanobis distance based improved Fuzzy C-Means clustering for mobile health monitoring systems”, Neurocomputing, 220, pp. 221-235, (2017).
[8] Zarandi, MH Fazel, A. Seifi, H. Esmaeeli et al. “A type-2 fuzzy hybrid expert system for commercial burglary”, In North American Fuzzy Information Processing Society Annual Conference, pp. 41-51 (2017).
[9] Fazel Zarandi, M. H., Faraji, M. R., and Karbasian, M. “An exponential cluster validity index for fuzzy clustering with crisp and fuzzy data”, Sci. Iran. Trans. E Ind. Eng, 17, pp. 95-110 (2010).
[10] Wu, K.L., Yang, M.S. “Alternative c-means clustering algorithms”, Pattern Recognition, 35, pp. 2267–2278 (2002).
[11] Dunn, J. C. “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters”, pp. 32-57 (1973).
[12] Krishnapuram, R., and Keller, J. M. “The possibilistic c-means algorithm: insights and recommendations’, IEEE transactions on Fuzzy Systems, 4(3), pp. 385-393 (1996).
[13] Pal, N. R., Pal, K., and Bezdek, J. C. “A mixed c-means clustering model. In Fuzzy Systems”, Proceedings of 6th International Fuzzy Systems IEEE, 1, pp. 11-21 (1997).
[14] Wang, W., and Zhang, Y. “On fuzzy cluster validity indices”, Fuzzy sets and systems, 158(19), pp. 2095-2117 (2007).
[15] Bezdek, J. C., Keller, J., Krisnapuram, R. et al. “Fuzzy models and algorithms for pattern recognition and image processing”, Springer Science and Business Media, (2006).
[16] Wijayasekara, D., Linda, O., and Manic, M. “Shadowed Type-2 Fuzzy Logic Systems”, In T2FUZZ , pp. 15-22 (2013).
[17] Fukuyama, Y., and Sugeno, M. “A new method of choosing the number of clusters for the fuzzy c-means method”, In Proc. 5th Fuzzy Syst. Symp, (247), pp. 247-250 (1989).
[18] Xie, X. L., and Beni, G. “A validity measure for fuzzy clustering”, IEEE Transactions on pattern analysis and machine intelligence, 13(8), pp. 841-847(1991).
[19] Kwon, S.H. “Cluster validity index for fuzzy clustering”, Electron Lett., 34(22), pp. 2176-2178 (1998).
[20] Gath, I., and Geva, A. B. “Unsupervised optimal fuzzy clustering”, IEEE Transactions on pattern analysis and machine intelligence, 11(7), pp. 773-780 (1989).
[21] Wu, K. L., & Yang, M. S. “A cluster validity index for fuzzy clustering”, Pattern Recognition Letters, 26(9), pp. 1275-1291(2005).
[22] Zhang, Y., Wang, W., Zhang, X. et al. “A cluster validity index for fuzzy clustering”, Information Sciences, 178(4), pp. 1205-1218 (2008).
[23] Rezaee, B “A cluster validity index for fuzzy clustering”, Fuzzy Sets and Systems, 161(23), pp. 3014-3025 (2010).
[24] Zhang, D., Ji, M., Yang, J. et al. “A novel cluster validity index for fuzzy clustering based on bipartite modularity”, Fuzzy Sets and Systems, 253, pp. 122-137 (2014).
[25] Zarandi, M. H. F., Neshat, E., and Türkşen, I. B. “Retracted Article: A New Cluster Validity Index for Fuzzy Clustering Based on Similarity Measure”, In International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing,Springer, Berlin, Heidelberg, pp. 127-135 (2007).
[26] Askari, S., Montazerin, N., & Zarandi, M. F. “Generalized Possibilistic Fuzzy C-Means with novel cluster validity indices for clustering noisy data”, Applied Soft Computing, 53, pp. 262-283 (2017).
[27] Pal, N. R., and Pal, S. K. “Entropy: A new definition and its applications”, IEEE transactions on systems, man, and cybernetics, 21(5), pp. 1260-1270 (1991).
[28] Pal, N. R., and Pal, S. K. “Some properties of the exponential entropy”, Information sciences, 66(1-2), pp. 119-137(1992).
[29] Bezdek, J. C. “Pattern Recognition with Fuzzy Objective Algorithms”, Plenum Press, New York. (1981).
[30] McBratney, A. B., and Moore, A. W. “Application of fuzzy sets to climatic classification”, Agricultural and forest meteorology, 35(1-4), pp. 165-185 (1985).
[31] Choe, H., and Jordan, J. B. “On the optimal choice of parameters in a fuzzy c-means algorithm”, IEEE International Conference on Fuzzy Systems, pp. 349-354 (1992).
[32] Yu, J., Cheng, Q., and Huang, H. “Analysis of the weighting exponent in the FCM”, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 34(1), pp. 634-639 (2004).
[33] Okeke, F., and Karnieli, A. “Linear mixture model approach for selecting fuzzy exponent value in fuzzy c-means algorithm”, Ecological Informatics, 1(1), pp. 117-124 (2006).
[34] Bezdek, J.C. “Pattern Recognition in Handbook of Fuzzy Computation”, IOP Publishing Ltd., Boston, MA, (1998),
[35] UCI Machine Learning Repository, Retrieved October 21, 2018, from http://www.ics.uci.edu/~mlearn/databases.html.
[36] Torshizi, A. D., Zarandi, M. F., and Türksen, I. B. “Computing centroid of general type-2 fuzzy set using constrained switching algorithm”, Scientia Iranica, Transaction E, Industrial Engineering, 22(6), p 2664 (2015).
[37] Jothi, R., Sraban Kumar Mohanty, and Aparajita Ojha. "DK-means: a deterministic K-means clustering algorithm for gene expression analysis", Pattern Analysis and Applications, pp. 1-19 (2017).
[38] Hosseini, Behrooz, and Kourosh Kiani. "FWCMR: A scalable and robust fuzzy weighted clustering based on MapReduce with application to microarray gene expression", Expert Systems with Applications 91, pp. 198-210 (2018).
[39] Jiang, Daxin, Chun Tang, and Aidong Zhang. "Cluster analysis for gene expression data: a survey", IEEE Transactions on knowledge and data engineering. 16(11), pp. 1370-1386 (2004).
[40] The Transcriptional Program of Sporulation in Budding Yeast. (n.d.). Retrieved October 21, 2018, from http://www.sciencemag.org/content/282/5389/699.long
[41] Validating Clustering for Gene Expression Data. (n.d.). Retrieved October 21, 2018, from http://faculty.washington.edu/ kayee/cluster.
[42] Biological Data Analysis using Clustering. (n.d.). Retrieved October 21, 2018, from http://homes.esat.kuleuven.be/~thijs Work/Clustering.html
[43] Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. “Cluster analysis and display of genome-wide expression patterns”, Proceedings of the National Academy of Sciences, 95(25), pp. 14863-14868 (1998).
[44] Open-edit radiology resource, Retrieved October 21, 2018, from http:// Radiopaedia.org.