A semi-supervised clustering approach using labeled data

Document Type : Article

Authors

1 Department of Computer Engineering, Karaj Branch, Islamic Azad University, Karaj, Iran

2 Computer and Electrical Engineering Department, University of Tabriz, Tabriz, Iran

Abstract

Over recent decades, there has been a growing interest in semi-supervised clustering. Compared to the supervised or unsupervised clustering methods for solving different real-life problems, reviewed articles show that semi-supervised clustering methods are more powerful, and even a small amount of supervised information can significantly improve the results of unsupervised methods. One popular method of incorporating partial supervised information is through labeled data. In this study, we propose a semi-supervised clustering algorithm called ConvexClust. The proposed method improves data clustering using a geometric view borrowed from the Lune concept in the connectivity index and 10% of labeled data. Clustering starts with the use of labeled data and the formation of a convex hull. It continues over the labeling of non-labeled data and the updating of the convex hull in an iterative process. Evaluations of three UCI datasets and sixteen artificial datasets show that the proposed method outperforms the other semi-supervised and traditional clustering techniques.

Keywords


References:
1. Zhang, H., Li, H., Chen, N., et al. "Novel fuzzy clustering algorithm with variable multi-pixel fitting spatial information  for image segmentation", Pattern Recognit., 121, pp. 108-201 (2022).
2. Zhao, F., Cao, L., Liu, H., et al. "Particle competitive mechanism-based multi-objective rough clustering algorithm for image segmentation", IEEE Trans. Fuzzy Syst., 30(10), pp. 4127-4141 (2022).
3. Hashemi, H., De Beukelaar, P., Beiranvand, B., et al. "Clustering seismic datasets for optimized facies analysis using a SSCSOM technique", 79th EAGE Conf. and Exhibition, 2017(1), pp. 1-5 (2017).
4. Gaur, A. and Yadav, S. "Handwritten Hindi character recognition using k-means clustering and SVM", 4th Int. Symp. on Emerg. Trends and Tech. in Libr. and Info. Serv., Noida, India, pp. 65-70 (2015).
5. Singh, R., Shukla, A.K., Mishra, R.K., et al., An Improved Approach for Devanagari Handwritten Characters Recognition System, pp. 217-226 (2022).
6. Ma, X., Keung, J., Yang, Z., et al. "Combining clustering with attention semantic model for identifying security bug reports", Inf. Softw. Technol., 147, pp. 106-906 (2022).
7. Ye, W., Wang, H., and Zhong, Y. "Optimization of network security protection situation based on data clustering", Int. J. Syst. Assur. Eng. Manag., pp. 1-8 (2022).
8. Kanthimathi, N., Roshini Roy, J., Saranya, N., et al. "Trust-based security scheme using fuzzy clustering for vehicular Ad Hoc networks", Soft Comp. for Secu. Appli., Singapore, pp. 425-436 (2022).
9. Sathyamoorthy, M., Kuppusamy, S., Dhanaraj, R.K., et al. "Improved K-means based Q learning algorithm for optimal clustering and node balancing in WSN", Wirel. Pers. Commun., 122(3), pp. 2745-2766 (2021).
10. Sharma, R., Vashisht, V., and Singh, U. "A fuzzybased clustering algorithm using hybrid technique for wireless sensor networks", Int. J. Adv. Intell. Paradig., 21(1-2), pp. 129-157 (2022).
11. Jayaraman, G. and Dhulipala, V.R.S. "Fuzzy-based energy-efficient cluster head selection algorithm for lifetime enhancement of wireless sensor networks", Arab. J. Sci. Eng., 47(2), pp. 1631-1641 (2021).
12. Srinivas, M. and Amgoth, T. "Data acquisition in large-scale wireless sensor networks using multiple mobile sinks: a hierarchical clustering approach", Wirel. Networks, 28(2), pp. 603-619 (2022).
13. Ezugwu, A.E., Ikotun, A.M., Oyelade, O.O., et al. "A comprehensive survey of clustering algorithms: Stateof-the-art machine learning applications, taxonomy, challenges, and future research prospects", Eng. Appl. Artif. Intell., 110, pp. 104-743 (2022).
14. Sun, R. "A recognition method for visual image of sports video based on fuzzy clustering algorithm", Int. J. Inf. Commun. Technol., 20(1), pp. 1-17 (2022).
15. Sardar, T.H. and Ansari, Z. "MapReduce-based fuzzy C-means algorithm for distributed document clustering", J. Inst. Eng. Ser. B, 103(1), pp. 131-142 (2021).
16. Agapito, G. and Fedele, G. "Clustering methods for microarray data sets", Methods Mol. Biol., 240(1), pp. 249-261 (2022).
17. Sharma, C.M. and Dinkar, S.K. "A survey on evolutionary clustering algorithms and applications", Appl. Adv. Optim. Tech. Ind. Eng., pp. 23-34 (2022).
18. Wei, S., Li, Z., and Zhang, C. "A semi-supervised clustering ensemble approach integrated constraintbased and metric-based", 7th Int. Conf. on Inte. Multim. Comp. and Serv., Hunan, China, pp. 19-21 (2015).
19. Alok, A., Saha, S., and Ekbal, A. "MR brain image segmentation using muti-objective semi-supervised clustering", Int. Conf. on Signal Process., Inform., Commun. and Energy Sys., Kozhikode, India (2015).
20. Qin, Y., Ding, S., Wang, L., et al. "Research progress on semi-supervised clustering", Cognit. Comput., 11, pp. 599-612 (2019).
21. Dinler, D. "A survey of constrained clustering", In Unsupervised Learning Algorithms, Tural, M.K., Springer (2016).
22. Nanda, S.J. and Panda, G. "A survey on nature inspired metaheuristic algorithms for partitional clustering", Swarm Evol. Comput., 16, pp. 1-18 (2014).
23. Sanodiya, R.K., Saha, S., and Mathew, J. "A kernel semi-supervised distance metric learning with relative distance: Integration with a MOO approach", Expert Syst. Appl., 125, pp. 233-248 (2019).
24. Zhang, Z., Kwok, J.T., and Yeung, D.Y. "Parametric distance metric learning with label information", Report HKUST-CS 03-02, Depar. of Compu. Scien. The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong (2003).
25. Sanodiya, R., Saha, S., and Mathew, J. "A kernel semi-supervised distance metric learning with relative distance: Integration with a MOO approach", Expe. Syst. with Applic., 125, pp. 233-248 (2019).
26. Baghshah, M. and Shouraki, S. "Kernel-based metric learning for semi-supervised clustering", Neurocomput., 73(7-9), pp. 1352-1361 (2010).
27. Wagstaff, K. and Cardie, C. "Clustering with instancelevel constraints", 17th Inter. Conf. on Machi. Learn., pp. 1103-1110 (2000).
28. Basu, S., Basu, S., Banerjee, A., et al. "Semisupervised clustering by seeding", 19th Inter. Conf. on Machi. Learn., Sydney, Australia, pp. 19-26 (2002).
29. Hashemi, H., Javaherian, A., and Babuska, R. "A semi-supervised method to detect seismic random noise with fuzzy GK clustering", J. Geophys. Eng., 5(4), pp. 457-468 (2008).
30. Gath, I. and Geva, A.B. "Unsupervised optimal fuzzy clustering", IEEE Trans. on Patte. Analy. and Machi. Intelli., 11(7), pp. 773-781 (1989).
31. Saha, S. and Bandyopadhyay, S. "Some connectivity based cluster validity indices", Appl. Soft Comput. J., 12(5), pp. 1555-1565 (2012).
32. Cormen, T.H., Introduction to Algorithms, Leiserson, C.E., Rivest, R.L., et al., 3th Edn., pp.1-1313, MIT Press, Massachusetts, UK (2009).
33. Bezdek, J.C., Ehrlich, R., and Full, W. "FCM: The fuzzy C-means clustering algorithm", Comp. and Geosci., 10(2-3), pp. 191-203 (1984).
34. Handl, J. and Knowles, J. "An evolutionary approach to multiobjective clustering", IEEE Transa. on Evol. Comput., 11(1), pp. 56-76 (2007).
35. Wei, S., Li, Z., and Zhang, C. "Combined constraintbased with metric-based in semi-supervised clustering ensemble", Int. J. Mach. Learn. Cybern., 9(7), pp. 1085-1100 (2018).
36. Yu, Z., Kuang, Z., Liu, J., et al. "Adaptive ensembling of semi-supervised clustering solutions", IEEE Trans. Knowl. Data Eng., 29(8), pp. 1577-1590 (2017).