基于点连通性的聚类(Clustering based on connectivity of points)

编程入门 行业动态 更新时间:2024-10-19 02:27:12
基于点连通性的聚类(Clustering based on connectivity of points)

我有100万纬度[5位数精度]和路线记录。 我想聚集那些数据点。

我不想使用标准的k-means聚类,因为我不确定有多少clsuters [试过Elbow方法但不相信]。

这是我的逻辑 -

1)我想将lat long的宽度从5位减少到3位。

2)现在,在+/- 0.001范围内的lat long将聚集在一次簇中。 计算集群的质心。

但是这样做我无法找到好的算法和R脚本来执行我的思想代码。

任何人都可以帮助我解决上述问题。

谢谢,

I have 1 million records of lat long [5 digits precision] and Route. I want to cluster those data points.

I dont want to use standard k-means clustering as I am not sure how many clsuters [tried Elbow method but not convinced].

Here is my Logic -

1) I want to reduce width of lat long from 5 digits to 3 digits.

2) Now lat longs which are in range of +/- 0.001 are to be clustered in once cluster. Calculate centroid of cluster.

But in doing so I am unable to find good algorithm and R Script to execute my thought code.

Can any one please help me in above problem.

Thanks,

最满意答案

可以基于连接的组件完成群集。

可以连接彼此相距+/- 0.001的所有点,因此我们将有一个包含子图的图形,每个子图可以是单个点或一系列连接点(连接的组件),然后可以找到连接的组件并且可以计算出centeroid。 此任务需要两个包:

1. deldir形成点的三角测量并指定哪些点相互对应并计算它们之间的距离。

2 igraph找到连接的组件。

library(deldir) library(igraph) coords <- data.frame(lat = runif(1000000),long=runif(1000000)) #round to 3 digits coords.r <- round(coords,3) #remove duplicates coords.u <- unique(coords.r) # create triangulation of points. depends on the data may take a while an consume more memory triangulation <- deldir(coords.u$long,coords.u$lat) #compute distance between adjacent points distances <- abs(triangulation$delsgs$x1 - triangulation$delsgs$x2) + abs(triangulation$delsgs$y1 - triangulation$delsgs$y2) #remove edges that are greater than .001 edge.list <- as.matrix(triangulation$delsgs[distances < .0011,5:6]) if (length(edge.list) == 0) { #there is no edge that its lenght is less than .0011 coords.clustered <- coords.u } else { # find connected components #reformat list of edges so that if the list is # 9 5 # 5 7 #so reformatted to # 3 1 # 1 2 sorted <- sort(c(edge.list), index.return = TRUE) run.length <- rle(sorted$x) indices <- rep(1:length(run.length$lengths),times=run.length$lengths) edge.list.reformatted <- edge.list edge.list.reformatted[sorted$ix] <- indices #create graph from list of edges graph.struct <- graph_from_edgelist(edge.list.reformatted, directed = FALSE) # cluster based on connected components clust <- components(graph.struct) #computation of centroids coords.connected <- coords.u[run.length$values, ] centroids <- data.frame(lat = tapply(coords.connected$lat,factor(clust$membership),mean) , long = tapply(coords.connected$long,factor(clust$membership),mean)) #combine clustered points with unclustered points coords.clustered <- rbind(coords.u[-run.length$values,], centroids) # round the data and remove possible duplicates coords.clustered <- round(coords.clustered, 3) coords.clustered <- unique(coords.clustered) }

Clustering can be done based on connected components.

All points that are in +/-0.001 distance to each other can be connected so we will have a graph that contains subgraphs that each may be a single poin or a series of connected points(connected components) then connected components can be found and their centeroid can be calculated. Two packages required for this task :

1.deldir to form triangulation of points and specify which points are adaject to each other and to calculate distances between them.

2 igraph to find connected components.

library(deldir) library(igraph) coords <- data.frame(lat = runif(1000000),long=runif(1000000)) #round to 3 digits coords.r <- round(coords,3) #remove duplicates coords.u <- unique(coords.r) # create triangulation of points. depends on the data may take a while an consume more memory triangulation <- deldir(coords.u$long,coords.u$lat) #compute distance between adjacent points distances <- abs(triangulation$delsgs$x1 - triangulation$delsgs$x2) + abs(triangulation$delsgs$y1 - triangulation$delsgs$y2) #remove edges that are greater than .001 edge.list <- as.matrix(triangulation$delsgs[distances < .0011,5:6]) if (length(edge.list) == 0) { #there is no edge that its lenght is less than .0011 coords.clustered <- coords.u } else { # find connected components #reformat list of edges so that if the list is # 9 5 # 5 7 #so reformatted to # 3 1 # 1 2 sorted <- sort(c(edge.list), index.return = TRUE) run.length <- rle(sorted$x) indices <- rep(1:length(run.length$lengths),times=run.length$lengths) edge.list.reformatted <- edge.list edge.list.reformatted[sorted$ix] <- indices #create graph from list of edges graph.struct <- graph_from_edgelist(edge.list.reformatted, directed = FALSE) # cluster based on connected components clust <- components(graph.struct) #computation of centroids coords.connected <- coords.u[run.length$values, ] centroids <- data.frame(lat = tapply(coords.connected$lat,factor(clust$membership),mean) , long = tapply(coords.connected$long,factor(clust$membership),mean)) #combine clustered points with unclustered points coords.clustered <- rbind(coords.u[-run.length$values,], centroids) # round the data and remove possible duplicates coords.clustered <- round(coords.clustered, 3) coords.clustered <- unique(coords.clustered) }

更多推荐

本文发布于:2023-04-29 04:09:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1334516.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:连通性   Clustering   based   connectivity   points

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!