A novel selective clustering framework for appropriate labeling of clusters based on K-means algorithm

Document Type : Article

Authors

School of Industrial Engineering, Iran University of Science & Technology, Tehran, Iran.

Abstract

Clustering is one of the main methods of data mining. K-means algorithm is one of the most common clustering algorithms due to its efficiency and ease of use. One of the challenges of clustering is to identify the appropriate label for each cluster. The selection of a label is done in such a way as to provide a proper description of the cluster records. In some cases, choosing the appropriate label is not easy due to the results and structure of each cluster. The aim of this study is to present an algorithm based on the K-means clustering in order to facilitate the allocation of labels to each cluster. Moreover, in many data mining issues, the data set contains a large number of fields and therefore, the identification of the fields and the extraction of subsets from the required fields is an important issue. With the help of the proposed algorithm, the important and influential variables of the data set would be identified and the subset of the required fields would be selected.

Keywords

Main Subjects


1. Duan, G., Hu, W., and Zhang, Z. A novel multilayer data clustering framework based on feature selection and modi_ed K-means algorithm", International Journal of Signal Processing, Image Processing and Pattern Recognition, 9(4), pp. 81{90 (2016). http://dx.doi.org/10.14257/ijsip.2016.9.4.08 2. Haeri, A. and Tavakkoli-Moghaddam, R. Developing a hybrid data mining approach based on multiobjective particle swarm optimization for solving a traveling salesman problem", Journal of Business Economics and Management, 13(5), pp. 951{967 (2012). 3. Moslehi, F., Haeri, A., and Moini, A. Analyzing and investigating the use of electronic payment tools in Iran using data mining techniques", Journal of AI and Data Mining, 6(2), pp. 417{437 (2018). DOI: 10.22044/jadm.2017.5352.1643 4. Amezquita-Sanchez, J.P. and Adeli, H. Feature extraction and classi_cation techniques for health monitoring of structures", Scientia Iranica, 22(6), pp. 1931{1940 (2015). 5. Cheng, T., Li, P., Zhu, S., and Torrieri, D. Mcluster and X-ray: Two methods for multi jammer localization in wireless sensor networks", Integrated Computer-Aided Engineering, 21(1), pp. 19{34 (2014). 6. Goncalves, N., Nikkila, J., and Vigario, R. Selfsupervised MRI tissue segmentation by discriminative clustering", International Journal of Neural Systems, 24(1), 1450004 (2014). 7. Saxena, A., Prasad, M., Gupta, A., et al. A review of clustering techniques and developments", Neurocomputing, 267, pp. 664{681 (2017). 8. MacQueen, J.B. Some methods for classi_cation and analysis of multivariate observations", Proc. 5th Symp. Mathematical Statistics and Probability, Berkelely, CA, 1, pp. 281{297 (1967). 9. Huang, Z. Extensions to the k-means algorithms for clustering large data sets with categorical values", Data Min Knowl Disc, 2, pp. 283{304 (1998). 10. Green, P.E., Kim, J., and Carmone, F.J. A preliminary study of optimal variable weighting in k-means clustering", Journal of Classi_cation, 7(2), pp. 271{ 285 (1990). 11. Huang, J.Z., Ng, M.K., Rong, H., et al. Automated variable weighting in k-means type clustering", IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), pp. 657{668 (2005). https://doi.org/10.1109/TPAMI.2005.95 12. He, Z. Evolutionary K-means with pair-wise constraints", Soft Computing, 20(1), pp. 287{301 (2016). 13. Yuan, F., Meng, Z.H., Zhang, H.X., and Dong, C.R. A new algorithm to get the initial centroids", Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, pp. 26{29 (2004). 14. Zhang, C.H. and Xia, S.H. K-means clustering algorithm with improved initial center", In Second International Workshop on Knowledge Discovery and Data Mining (WKDD), pp. 790{792 (2009). 15. Nazeer, K.A.A. and Sebastian, M.P. Improving the accuracy and e_ciency of the k-means clustering algorithm", In Proceedings of the World Congress on Engineering, 1, pp. 1{5 (2009). 16. San, O.M., Huynh, V.N., and Nakamori, Y. An alternative extension of the k-means algorithm for clustering categorical data", International Journal of Applied Mathematics and Computer Science, 14, pp. 241{247 (2004). 17. Fahim, A.M., Salem, A.M., Torkey, F.A., et al. An e_cient enhanced k-means clustering algorithm", Journal of Zhejiang University-Science, 7(10), pp. 1626{1633 (2006). 18. Ahmad, A. k-mean clustering algorithm for mixed numeric and categorical data", Data & Knowledge Engineering, 63, pp. 503{527 (2007). https://doi.org/10.1016/j.datak.2007.03.016 19. Arai, K. and Barakbah, A.R. Hierarchical K-means: an algorithm for centroids initialization for K-means", F. Moslehi et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 2621{2634 2633 Reports of the Faculty of Science and Engineering, 36, pp. 25{31 (2007). 20. Laszlo, M. and Mukherjee, S.A. Genetic algorithm that exchanges neighboring centers for k-means clustering", Pattern Recognit. Lett., 28(16), pp. 2359{2366 (2007). 21. Zalik, K.R. An e_cient k-means clustering algorithm", Pattern Recognit. Lett., 29, pp. 1385{1391 (2008). 22. Kao, Y.T., Zahara, E., and Kao, I.W. A hybridized approach to data clustering", Expert Syst. Appl., 34(3), pp. 1754{1762 (2008). 23. Zhang, C.H. and Xia, S.H. K-means clustering algorithm with improved initial center", In Second International Workshop on Knowledge Discovery and Data Mining (WKDD), pp. 790{792 (2009). 24. Yedla, M., Pathakota, S.R., and Srinivasa, T.M. Enhancing K-means clustering algorithm with improved initial center", International Journal of Computer Science and Information Technologies, 1(2), pp. 121{ 125 (2010). 25. Niknama, T., Fard, E.T., Pourjafarian, N., et al. An e_cient hybrid algorithm based on modi_ed imperialist competitive algorithm and K-means for data clustering". Eng. Appl. Artif. Intel., 24(2), pp. 306{317 (2011). https://doi.org/10.1016/j. engappai. 2010.10.001 26. Hassanzadeh, T. and Meybodi, M.R. A new hybrid approach for data clustering using _rey algorithm and K-means", The 16th CSI International Symposium on Arti_cial Intelligence and Signal Processing, AISP, 007{011 (2012). https://doi.org/10.1109/ AISP.2012.6313708 27. Celebi, M.E., Kingravi, H.A., and Vela, P.A. A comparative study of e_cient initialization methods for the k-means clustering algorithm", Expert Syst Appl., 40(1), pp. 200{210 (2013). https://doi.org/10.1016/ j.eswa.2012.07.021 28. Tzortzis, G. and Likas, A. The MinMax k-means clustering algorithm". Pattern Recognit, 47(7), pp. 2505{2516 (2014). 29. Gu_erin, J., Gibaru, O., Thiery, S., and Nyiri, E. Clustering for di_erent scales of measurement-the gap-ratio weighted K-means algorithm", arXiv Preprint arXiv, pp. 1703.07625 (2017). 30. Lin, K.P. Privacy-preserving kernel k-means clustering outsourcing with random transformation", Knowledge and Information Systems, 49(3), pp. 1{24 (2016). 31. Nagwani, N.K. and Shara_, A. SMS spam _ltering and thread identi_cation using bi-level text classi_cation and clustering techniques", Journal of Information Science, 43(1), pp. 75{87 (2017). 32. Chen, L., Xu, Z., Wang, H., and Liu, S. An ordered clustering algorithm based on K-means and the PROMETHEE method", International Journal of Machine Learning and Cybernetics, 9(6), pp. 917{926 (2018). 33. Gan, G. and Ng, M.K.P. k-means clustering with outlier removal", Pattern Recognition Letters, 90, pp. 8{14 (2017). 34. Yaghini, M. and Ghazanfari, N. Tabu-KM: a hybrid clustering algorithm based on tabu search approach", International Journal of Industrial Engineering & Production Research, 21(2) pp. 71{79 (2010). 35. Han, J., Kamber, M., and Tung, A.KH., Spatial Clustering Methods in Data Mining: A Survey, London: Taylor & Francis (2001). 36. Jain, A.K., Murty, M.N., and Flynn, P.J. Data clustering: A review", ACM Computing Surveys 1999, 31, pp. 264{323 (1999). 37. Maimon, O.Z. and Rokach, L., Data Mining and Knowledge Discovery Handbook, New York, Springer (2005). 38. Fazel, Z.M. and Zarinbal, M. Image segmentation: Type-2 fuzzy possibilistic C-mean clustering approach", Journal of Industrial Engineering & Production Research, 23(4) pp. 245{251 (2012). 39. Farajian, M.A. and Mohammadi, S. Mining the banking customer behavior using clustering and association rules methods", Journal of Industrial Engineering & Production Research, 21(4), pp. 239{245 (2010).