Study on Text Clustering For Topic Identification

Sindhu Antony, Rupali Wagh


Due to advent of web technologies, amount data available has grown enormously. Information retrieval from this data thus has become most important operation. Huge portion of data are available as text and poses new challenges in information retrieval and search operations. Large texts can be grouped into clusters which process the text easily. Grouping of text based on similarity of contents is called as text clustering. Topic identification refers to recognizing topic/ideas conveyed in the text. It is necessary to extract the key idea of a text if no categorization of text exists. Topics can be identified by clustering text and then extracting keywords from clusters. There are many algorithms for text clustering. Hierarchical clustering and K-means clustering are two important clustering techniques. This survey paper discusses about various clustering algorithms and application of clustering for topic identification. Also focuses on challenges and issues of text clustering and topic identification.

Keywords: Topic identification, Text clustering, K-means clustering, Hierarchical clustering, Differential cluster labeling, Cluster initial labeling

Full Text:




  • There are currently no refbacks.

Copyright (c) 2017 International Journal of Advanced Research in Computer Science