A MapReduce-based Big Data Clustering using Swarm-inspired Meta-heuristic Algorithms

Document Type : Article

Authors

1 Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran

2 - Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran - Big Data Research Center, Najafabad branch, Islamic Azad University, Najafabad, Iran

Abstract

Clustering is one of the important methods in data analysis. For big data, clustering is difficult due to the volume of data and the complexity of clustering algorithms. Therefore, methods that can handle a large amount of data clustering at the reasonable time are required. MapReduce is a powerful programming model that allows parallel algorithms to run in distributed computing environments. In this study, an improved artificial bee colony algorithm based on a MapReduce clustering model (MR-CWABC) is proposed. The weighted average without greedy selection of the results improves the local and global search of ABC. The improved algorithm is implemented in accordance with the MapReduce model on the Hadoop framework to allocate optimal samples to the clusters such that the compression and separation of the clusters are preserved. The proposed method is compared with some well-known bio-inspired algorithms such as particle swarm optimization (PSO), artificial bee colony (ABC) and gravitational search algorithm (GSA) implemented based on the MapReduce model on the Hadoop framework. The results showed that MR-CWABC is well-suited for big data, while maintaining clustering quality. The MR-CWABC demonstrates an improvement of 7.13%, 7.71% and 6.77% based on the average F-measure compared to MR-CABC, MR-CPSO, and MR-CGSA, respectively.

Keywords

Main Subjects