Sale!

An Extensive Overview, Study of Variants in the Big Data Age: K-means Clustering Algorithms

Original price was: ₹ 201.00.Current price is: ₹ 200.00.

Page: 55-59

Amit Khirbat and Sandeep (Department of Computer Science & Engineering, Om Sterling Global University, Hisar, Haryana)

Description

Page: 55-59

Amit Khirbat and Sandeep (Department of Computer Science & Engineering, Om Sterling Global University, Hisar, Haryana)

Clustering of large amount of data is a big challenge due to enormous and varied data sets that are structured, unstructured and semi-structured and that keep expanding at an exponential rate. The volume velocity and variety of these datasets are so large and complicated that they cannot be stored, processed, or analysed by conventional data management systems. There are many clustering algorithms used for data clustering out of which K-means algorithm remains on top due to its simplicity and low computational complexity. It is one of the well-known unsupervised machine learning algorithms.But there are a lot of issues with the K-means method that make it less effective at clustering. The initial cluster centres are chosen at random during the algorithm’s initialisation phase, and Users are required to define the number of clusters in a given dataset apriori. Moreover, the choice of this starting cluster might affect the algorithm’s performance, and choosing the ideal number of clusters to begin with becomes difficult and complex for huge datasets. Furthermore, the greedy character of the initial cluster centres’ random selection might occasionally lead to little local convergence. Another drawback is that the Euclidean distance metric is used to determine the similarity of some data object features. This makes it difficult to detect overlapping clusters and reduces the algorithm’s robustness in detecting other cluster shapes. The incapacity of the k-means algorithm to handle different data types is one of its main issues. In order to address these issues, this paper offers a systematic and concise summary of the research done on the k-means algorithm. K-means algorithm variations, including their most current advancements are examined, and their efficacy is examined through experimental analysis of several datasets.