admin管理员组

文章数量:1630183

Chinese Literature Clustering Research Based on Python K-means Algorithm

ZHAO Qian-yi;Guizhou University of Finance and Economics School of Information;

Clustering is an important means of effective organization, summarization and navigation of text information. The K-means algorithm is a very typical distance-based clustering algorithm. It is used for Chinese document clustering. According to the content similarity, a group of documents is divided into several categories and the invisible knowledge is found. In this paper, the K-means algorithm based on Python language is used to summarize the Chinese literature clustering process. The initial cluster cluster number of K-means algorithm is selected by three evaluation indexes: CH index, contour coefficient index and SSE index. The range of optimal k-values is then clustered according to keywords and based on abstracts, and the clustering results are compared and analyzed, so that the clustering of Chinese documents based on abstracts can get better results. In conclusion, the literature in the same category can be clustered by keywords to further explore the invisible knowledge.

CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.

本文标签: LiteratureClusteringchineseChinesePythonDocuments