A Comparison-Based Soft Clustering Algorithm for Documents
Ganesh Yadav1, Vipul Kumar Verma2
Citation : Ganesh Yadav, Vipul Kumar Verma, A Comparison-Based Soft Clustering Algorithm for Documents International Journal of Research Studies in Computer Science and Engineering 2019, 6(1) : 6-15.
Data document clustering is an most important tool for searching document such as Web search engines. Clustering data documents enables the accessor to have a good overall view of the information contained in the documents that he has. However, existing clustering algorithms faces from various aspects; complex clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while flexible such as soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose CSCA (Comparison-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. CSCA requires only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing complex hard clustering algorithms like K-means and its variants shows that CSCA is both effective and efficient.