A REVIEW ON K-MODE CLUSTERING ALGORITHM

Manisha Goyal; Shruti Aggarwal

doi:10.26483/ijarcs.v8i7.4301

PDF

Published: 2017-08-28

DOI: https://doi.org/10.26483/ijarcs.v8i7.4301

Keywords:

Data Mining, Clustering, K-Means Algorithm, K-Mode Algorithm

Manisha Goyal

Sri Guru Granth Sahib World University, Fatehgarh Sahib

Shruti Aggarwal

Abstract

The main purpose of the process of data mining is to extract useful information from a huge amount of dataset. As one of the most important tasks in data mining, clustering is the process of grouping object attributes and features such that the data objects in one group are more similar than data objects in another group. It is a form of unsupervised learning that means how data should be grouped the data objects (similar types) together will be not known in advance. The algorithms used for clustering are k-means algorithm, k-medoid algorithm, k-nearest neighbour algorithm, k-mode algorithm etc. The K-Mode Algorithm is an eminent algorithm which is an extension of the K-Means Algorithm for clustering data set with categorical attributes and is famous for its simplicity and speed. The â€˜Simple Matching Dissimilarityâ€™ measure is used instead of Euclidean distance and the â€˜Modeâ€™ of clusters is used instead of â€˜Meansâ€™. In this paper, review on the K-Mode Algorithm is done.

Downloads

Download data is not yet available.

Issue

Vol. 8 No. 7 (2017): July-August 2017

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

References

. Parneet Kaur, Manpreet Singh, Gurpreet Singh Josan, â€œClassification and prediction based data mining algorithms to predict slow learners in education sectorâ€, 3rd International Conference on Recent Trends in Computing, Elsevier, Vol. 57, pp. 500-508, 2015.

. Jeyhun Karimov, Murat Ozbayoglu, â€œClustering Quality Improvement of k-means using a Hybrid Evolutionary Modelâ€, Conference Organized by Missouri University of Science and Technology, San Jose, Science Direct, Vol. 61, pp. 38-45, 2015.

. Rui Xu, â€œSurvey of Clustering Algorithmsâ€, IEEE Transactions on Neural Networks, Vol. 16, pp. 645-678, May 2005.

. Han, J. and M. Kamber, â€œData Mining: Concepts and Techniquesâ€, Morgan Kaufmann Publishers, 3rd Edition, India, 2011.

. Farhi Marir, Huwida Said, Feras Al-Obeidat, â€œMining the Web and Literature to Discover New Knowledge about Diabetesâ€, The 3rd International Workshop on Machine Learning and Data Mining for Sensor Networks, Elsevier, Vol. 83, pp. 1256-1261, 2016.

. Preeti Arora, Deepali, Shipra Varshney, â€œAnalysis of K-Means and K-Medoids Algorithm For Big Dataâ€, International Conference on Information Security & Privacy, India, Science Direct, Vol. 78, pp. 507-512, 2016.

. Feng Jiang, Guozhu Liu, Junwei Du, Yuefei Sui, â€œInitialization of K-modes clustering using outlier detection techniquesâ€, Information Sciences, Science Direct, Vol. 332, pp. 167-183, 2016.

. Z. Huang, â€œA Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Miningâ€, In proceeding SIGMOD workshop research issues on data mining and knowledge discovery, pp.1â€“8, 1997.

. Z. Huang, â€œExtensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Valuesâ€, ACM Transaction on Data Mining and Knowledge Discovery, Vol. 2, pp. 283â€“304, 1998.

. Y. Sun, Q. Zhu, Z. Chen, â€œAn iterative initial-points refinement algorithm for categorical data clusteringâ€, Pattern Recognition Letters, Elsevier, Vol. 23, Issue. 7, pp. 875â€“884, 2002.

. D. Barbara, J. Coute, Yi Li, â€œCOOLCAT: An entropy based algorithm for categorical clusteringâ€, Proceedings of the eleventh international conference on Information and knowledge management, USA, ACM, pp. 582-589, 2002.

. F. Cao, J. Liang, L. Bai, â€œA new initialization method for categorical data clusteringâ€, Expert Systems with Applications, Science Direct, Vol. 36, pp. 10223-10228, 2009.

. S. S. Khan, A. Ahmad, â€œCluster Center Initialization for Categorical Data Using Multiple Attribute Clusteringâ€, Expert Systems with Applications, Elsevier, Vol. 40, pp. 7444â€“7456, 2013.

. R. S. Sangam, H. Om, â€œThe k-modes algorithm with entropy based similarity coefficientâ€, 2nd International Symposium on Big Data and Cloud Computing, Procedia Computer Science, Elsevier, Vol. 50, pp. 93-98, 2015.

. Z. He, S. Deng, X. Xu, â€œImproving K-Modes Algorithm Considering Frequencies of Attribute Values in Modeâ€, Computational Intelligence and Security, Springer, pp. 157-162, 2005.

. Amir Ahmad, Lipika Dey, â€œA K-Mean Clustering Algorithm for Mixed Numeric and Categorical Dataâ€, Data & Knowledge Engineering, Science Direct, Vol. 63, pp. 503â€“527, 2007.

. Amir Ahmad, Lipika Dey, â€œA method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data setâ€, Pattern Recognition Letters, Science Direct, Vol. 28, Issue. 1, pp. 110â€“118, 2007.

. M. K. Ng, M. J. Li, J. Z. Huang, â€œOn the Impact of Dissimilarity Measure in K-Modes Clustering Algorithmâ€, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Issue. 3, pp. 503-507, 2007.

. J. Lee, Y. J. Lee, M. Park, â€œClustering with Domain Value Dissimilarity for Categorical Dataâ€, Advances in Data Mining, Applications and Theoretical Aspects, Lecture Notes in Computer Science, Springer, Vol. 5633, pp. 310-324, 2009.

. D. Ienco, R. G. Pensa, R. Meo, â€œFrom Context to Distance: Learning Dissimilarity for Categorical Data Clusteringâ€, ACM Transactions on Knowledge Discovery from Data, pp.1-22, 2011.

. A. Desai, H. Singh, V. Pudi, â€œDISC: Data Intensive Similarity Measure for Categorical Dataâ€, Proceedings of Advances in Knowledge Discovery and Data Mining â€“ 15th Pacific Asia Conference, Springer, pp. 469 â€“ 481, 2011.

. F. Cao, J. Liang, D. Li, L. Bai, C. Dang, â€œA dissimilarity measure for the k-modes clustering algorithmâ€, Knowledge-Based Systems, Elsevier, Vol. 26, pp. 120â€“127, 2012.

. O. M. San, V. Hyunh, Y. Nakamori, â€œAn Alternative Extension of the k-Means Algorithm for Clustering Categorical Dataâ€. International Journal Applied Math and Computer Science, Vol.14, pp. 241â€“247, 2004.

. Y. M. Cheung, H. Jia, â€œCategorical and numerical attribute data clustering based on a unified similarity metric without knowing cluster numberâ€, Pattern Recognition, Elsevier, Vol. 46, pp. 2228â€“2238, 2013.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Issue

Section

References