A semi-supervised clustering approach using labeled data

Document Type : Article


1 Department of Computer Engineering, Karaj Branch, Islamic Azad University, Karaj, Iran

2 Computer and Electrical Engineering Department, University of Tabriz, Tabriz, Iran


Over recent decades, there has been a growing interest in semi-supervised clustering. Compared to the supervised or unsupervised clustering methods for solving different real-life problems, reviewed articles show that semi-supervised clustering methods are more powerful, and even a small amount of supervised information can significantly improve the results of unsupervised methods. One popular method of incorporating partial supervised information is through labeled data. In this study, we propose a semi-supervised clustering algorithm called ConvexClust. The proposed method improves data clustering using a geometric view borrowed from the Lune concept in the connectivity index and 10% of labeled data. Clustering starts with the use of labeled data and the formation of a convex hull. It continues over the labeling of non-labeled data and the updating of the convex hull in an iterative process. Evaluations of three UCI datasets and sixteen artificial datasets show that the proposed method outperforms the other semi-supervised and traditional clustering techniques.


