k表示聚类算法

编程入门 行业动态 更新时间:2024-10-27 22:34:48
本文介绍了k表示聚类算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我想对一组10个数据点执行k均值聚类分析,每个数据点都具有与之关联的4个数值数组.我正在使用Pearson相关系数作为距离度量.我做了k均值聚类算法的前两个步骤:

I want to perform a k means clustering analysis on a set of 10 data points that each have an array of 4 numeric values associated with them. I'm using the Pearson correlation coefficient as the distance metric. I did the first two steps of the k means clustering algorithm which were:

1)选择一组k个聚类的初始中心. [我随机选择了两个初始中心]

1) Select a set of initial centres of k clusters. [I selected two initial centres at random]

2)将每个对象分配给具有最近中心的聚类. [我使用Pearson相关系数作为距离度量标准-见下文]

2) Assign each object to the cluster with the closest centre. [I used the Pearson correlation coefficient as the distance metric -- See below]

现在我需要帮助来了解算法的第三步:

Now I need help understanding the 3rd step in the algorithm:

3)计算群集的新中心:

3) Compute the new centres of the clusters:

在这种情况下,X是4维向量,n是簇中数据点的数量.

where X, in this case is a 4 dimensional vector and n is the number of data points in the cluster.

如何计算以下数据的C(S)?

How would I go about calculating C(S) for say the following data?

# Cluster 1 A 10 15 20 25 # randomly chosen centre B 21 33 21 23 C 43 14 23 23 D 37 45 43 49 E 40 43 32 32 # Cluster 2 F 100 102 143 212 #random chosen centre G 303 213 212 302 H 102 329 203 212 I 32 201 430 48 J 60 99 87 34

k表示算法的最后一步是重复第2步和第3步,直到没有对象改变簇为止.

The last step of the k means algorithm is to repeat step 2 and 3 until no object changes cluster which is simple enough.

我需要有关步骤3的帮助.计算群集的新中心.如果有人可以讲解并解释如何计算仅一个集群的新中心,那将对我有极大的帮助.

I need help with step 3. Computing the new centres of the clusters. If someone could go through and explain how to compute the new centre of just one of the clusters, that would help me immensely.

推荐答案

步骤3对应于为每个群集计算平均值. 对于群集1,您将得到一个新的群集中心(B+C+D+E) / 4,即(35.25 33.75 29.75 21.75),即将群集中所有点的每个分量分别求和,然后除以群集中的点数.

Step 3 corresponds to calculating the mean for each cluster. For cluster 1, you'd get as new cluster center (B+C+D+E) / 4, which is (35.25 33.75 29.75 21.75), i.e sum each component for all the points in the cluster separately, and divide it by the number of points in the cluster.

群集中心(群集1的A)通常不是新群集中心计算的一部分.

The cluster center (A for cluster 1) is usually not part of the calculation of the new cluster center.

更多推荐

k表示聚类算法

本文发布于:2023-11-29 19:12:41,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1647364.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:算法

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!