A novel selective clustering framework for appropriate labeling of clusters based on K-means algorithm

Document Type : Article


School of Industrial Engineering, Iran University of Science & Technology, Tehran, Iran.


Clustering is one of the main methods of data mining. K-means algorithm is one of the most common clustering algorithms due to its efficiency and ease of use. One of the challenges of clustering is to identify the appropriate label for each cluster. The selection of a label is done in such a way as to provide a proper description of the cluster records. In some cases, choosing the appropriate label is not easy due to the results and structure of each cluster. The aim of this study is to present an algorithm based on the K-means clustering in order to facilitate the allocation of labels to each cluster. Moreover, in many data mining issues, the data set contains a large number of fields and therefore, the identification of the fields and the extraction of subsets from the required fields is an important issue. With the help of the proposed algorithm, the important and influential variables of the data set would be identified and the subset of the required fields would be selected.


Main Subjects

[1] Duan, G., Hu, W., and Zhang, Z. “A Novel Multilayer Data Clustering Framework based on Feature Selection and Modified K-Means Algorithm”, International Journal of Signal Processing, Image Processing and Pattern Recognition,9(4), pp. 81–90 (2016). http://dx.doi.org/10.14257/ijsip.2016.9.4.08.
[2] Haeri, A., & Tavakkoli-Moghaddam, R. “Developing a hybrid data mining approach based on multi-objective particle swarm optimization for solving a traveling salesman problem.” Journal of Business Economics and Management, 13(5), pp.951-967 (2012).
[3] Moslehi, F., Haeri, A., Moini, A. “Analyzing and Investigating the Use of Electronic Payment Tools in Iran using Data Mining Techniques.” Journal of AI and Data Mining, 6(2), pp.417-437 (2018). doi: 10.22044/jadm.2017.5352.1643.
[4] Amezquita-Sanchez, J.P., and  Adeli, H. “Feature extraction and classification techniques for health monitoring of structures”, SCIENTIA IRANICA, 22(6), pp. 1931-1940 (2015).
 [5] Cheng, T., Li, P., Zhu, S. and Torrieri, D. “M-cluster and X-ray: Two methods for multi jammer localization in wireless sensor networks”, Integrated Computer-Aided Engineering, 21(1), pp. 19-34 (2014). 
[6] Gon_calves, N., Nikkila, J. and Vig_ario, R. “Selfsupervised mri tissue segmentation by discriminative clustering”, International Journal of Neural Systems, 24(1), 1450004 (16 pages) (2014).
[7] Saxena, A., Prasad, M., Gupta, A., and et al. “A review of clustering techniques and developments”,  Neurocomputing, 267, pp. 664-681(2017).
[8] MacQueen, J.B. “Some methods for classification and analysis of multivariate observations”, Proc. 5-th Symp. Mathematical Statistics and Probability, Berkelely, CA 1967; 1:281–297.
[9] Huang, Z. “Extensions to the k-Means Algorithms for Clustering Large Data Sets with Categorical Values”, DATA MIN KNOWL DISC, 2, pp. 283-304 (1998).
[10] Green, P.E., Kim, J., and Carmone,  F.J. “A preliminary study of optimal variable weighting in k-means clustering” Journal of Classification, 7(2), pp. 271-285 (1990). 
[11] Huang, J.Z., Ng, M.K., Rong, H., and et al. “Automated variable weighting in k-means type clustering” IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), pp. 657–668 (2005). https://doi.org/10.1109/TPAMI.2005.95.
[12] He, Z. “Evolutionary K-Means with pair-wise constraints”, Soft Computing, 20(1), pp. 287-301(2016).
[13] Yuan, F., Meng, Z.H., Zhangz, X., and Dong, C.R. “A New Algorithm to Get the Initial Centroids”, Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, pp.26-29(2004).
[14] Zhang, C.h., Xia, S.h. “K-means Clustering Algorithm with Improved Initial center”,  in Second International Workshop on Knowledge Discovery and Data Mining (WKDD), pp.790- 792 (2009).
[15] Nazeer, K.A.A., and Sebastian, M.P. “Improving the Accuracy and Efficiency of the k-means Clustering Algorithm” In Proceedings of the World congress on Engineering, 1, pp. 1–5 (2009).
[16] San, O.M., Huynh, V.N., and Nakamori, Y. “An alternative extension of the k-means algorithm for clustering categorical data” International Journal of Applied Mathematics and Computer Science, 14, pp.241-247(2004).
[17] Fahim, A.M., Salem, A.M., Torkey, F.A., and et al. “An efficient enhanced k-means clustering algorithm”, Journal of Zhejiang University-Science, 7(10), pp. 1626-1633 (2006).
[18] Ahmad, A. “k-mean clustering algorithm for mixed numeric and categorical data”, Data & Knowledge Engineering, 63 pp. 503–527 (2007). https://doi.org/10.1016/j.datak.2007.03.016.
[19] Arai, K., and Barakbah, A.R. “Hierarchical K-means: an algorithm for centroids initialization for K-means”, Reports of the Faculty of Science and Engineering, 36, pp. 25-31 (2007).
[20] Laszlo, M., Mukherjee, S.A. “genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recognit. Lett, 28(16) pp. 2359–2366 (2007).
 [21] Zalik, K.R. “An efficient k-means clustering algorithm”, Pattern Recognit. Lett, 29, pp.1385–1391 (2008).
[22] Kao, Y.T., Zahara, E.,  and Kao, I.W. “A hybridized approach to data clustering”, Expert Syst Appl, 34(3), pp. 1754–1762 (2008).
[23] Zhang, C.h., Xia, S.h. “K-means Clustering Algorithm with Improved Initial center”,  in Second International Workshop on Knowledge Discovery and Data Mining (WKDD), pp.790- 792 (2009).
[24] Yedla, M., Pathakota, S.R, and Srinivasa, T.M. “Enhancing K-means clustering algorithm with improved initial center”, International Journal of computer science and information technologies, 1(2) pp.121-125 (2010).
[27] Niknama, T., Fard, E.T., Pourjafarian, N., and et al. “An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering”. ENG APPL ARTIF INTEL, 24(2) pp.306–317 (2011). https://doi.org/10.1016/j.engappai.2010.10.001.
[28] Hassanzadeh, T., and Meybodi, M.R. “A new hybrid approach for data clustering using firefly algorithm and K-means”, The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), (Aisp) 007–011. https://doi.org/10.1109/AISP.2012.6313708.
[29] Celebi, M.E, Kingravi, H.A, and Vela, P.A. “A comparative study of efficient initialization methods for the k-means clustering algorithm”, Expert Syst Appl. 40(1) pp. 200–210 (2013). https://doi.org/10.1016/j.eswa.2012.07.021.
 [30] Tzortzis, G., and Likas, A. “The MinMax k-Means clustering algorithm”. Pattern Recognit, 47(7), pp.2505-2516 (2014).
[31] Guérin, J., Gibaru, O., Thiery, S., and Nyiri, E. “Clustering for Different Scales of Measurement-the Gap-Ratio Weighted K-means Algorithm”, arXiv preprint arXiv pp. 1703.07625 2017.
[32] Lin, K.P. “Privacy-preserving kernel k-means clustering outsourcing with random transformation,” Knowledge and Information Systems, pp. 1–24 (2016).
[33] Nagwani, N. K., and Sharaff, A. “SMS spam filtering and thread identification using bi-level text classification and clustering techniques”, Journal of Information Science, 43(1), pp. 75-87 (2017). 
[34] Chen, L., Xu, Z., Wang, H., and Liu, S. “An ordered clustering algorithm based on K-means and the PROMETHEE method”, International Journal of Machine Learning and Cybernetics, 9(6) pp. 917-926 (2018). 
[35] Gan, G., and Ng, M. K. P.  “k-means clustering with outlier removal.”,Pattern Recognition Letters, 90, pp. 8-14 (2017).
[36] Yaghini, M., & Ghazanfari, N. “Tabu-KM: a hybrid clustering algorithm based on tabu search approach”, International Journal of Industrial Engineering & Production Research, 21(2) pp.71-79 (2010).
[37] Han, J., Kamber, M., and Tung, A. KH. “Spatial clustering methods in data mining: A survey”, London: Taylor & Francis (2001).
[38] Jain, A.K., Murty, M.N., and Flynn, P.J. “Data clustering: A review”, ACM Computing Surveys 1999; 31, pp. 264–323 (1999).
[39] Maimon, O.Z., and Rokach, L. “Data mining and knowledge discovery handbook”, New York: Springer (2005).
[40] FAZEL, Z. M., & Zarinbal, M. “Image Segmentation: Type–2 Fuzzy Possibilistic C-Mean Clustering Approach”, Journal of Industrial Engineering & Production Research, 23(4) pp. 245-251(2012). 
[41] Farajian, M. A., & Mohammadi, S. “Mining the banking customer behavior using clustering and association rules methods”, Journal of Industrial Engineering & Production Research, 21(4) pp.239-245 (2010).