cutree和cluster branches之间的分歧(Divergence between cutree and cluster branches)

系统教程 行业动态 更新时间:2024-06-14 17:02:17
cutree和cluster branches之间的分歧(Divergence between cutree and cluster branches)

我有一个数据集,包括属于4种不同基因型的一些小鼠的每日饮水量。 我正在尝试编写一个脚本,以便根据他们的水摄入模式使用分层聚类分析对这些动物进行分类,然后创建一个纵向图,绘制每个群集的平均每日水摄入量。

为此,我首先创建分层集群集群,如下所示:

library("dendextend") library("ggplot2") library("reshape2") data=read.csv("data.csv", header=T, row.names=1) trimmed=data[, -ncol(data)] hc <- as.dendrogram(hclust(dist(trimmed))) labels.drk=data[,ncol(data)] groups.drk=labels.drk[order.dendrogram(hc)] genotypes=as.character(unique(data[,ncol(data)])) k=4 cluster_cols=rainbow(k) hc <- hc %>% color_branches(k = k, col=cluster_cols) %>% set("branches_lwd", 1) %>% set("leaves_pch", rep(c(21, 19), length(genotypes))[groups.drk]) %>% set("leaves_col", palette()[groups.drk]) plot(hc, main="Total animals" ,horiz=T) legend("topleft", legend=genotypes, col=palette(), pch = rep(c(21,19), length(genotypes)), title="Genotypes") legend("bottomleft", legend=1:k, col=cluster_cols, lty = 1, lwd = 2, title="Drinking group")

然后我使用cutree函数来评估哪个动物属于哪个组,以绘制每组的进水平均值。

groups<-cutree(hc, k=k, order_clusters_as_data = FALSE)) x<-cbind(data,groups) intake_avg=aggregate(data[, -ncol(data)], list(x$groups), mean, header=T) df <- melt(intake_avg, id.vars = "Group.1") ggplot(df, aes(variable, value, group=factor(Group.1))) + geom_line(aes(color=factor(Group.1)))

问题是我从分层集群中获得的数字与cutree函数分配的数字之间存在不一致。 虽然集群从1到4对分支进行了自下而上的命令,但cutree函数正在使用一些我不熟悉的其他排序参数。 因此,群集图和进气图中的标签不匹配。

我是编码的初学者,所以我确实使用了太多的冗余线和循环,所以我的代码可以缩短,但如果你们能帮我弄清楚这个具体问题,我会很高兴。

数据集

簇:

进入图

I have a data set consisting of the daily water intake for some mice belonging to 4 different genotypes. I am trying to write a script in order to classify these animals according to their pattern of water intake using a hierarchical cluster analysis and then create a longitudinal graph plotting the average water intake per cluster across days.

For doing that, I am first creating the hierarchical cluster cluster as follows:

library("dendextend") library("ggplot2") library("reshape2") data=read.csv("data.csv", header=T, row.names=1) trimmed=data[, -ncol(data)] hc <- as.dendrogram(hclust(dist(trimmed))) labels.drk=data[,ncol(data)] groups.drk=labels.drk[order.dendrogram(hc)] genotypes=as.character(unique(data[,ncol(data)])) k=4 cluster_cols=rainbow(k) hc <- hc %>% color_branches(k = k, col=cluster_cols) %>% set("branches_lwd", 1) %>% set("leaves_pch", rep(c(21, 19), length(genotypes))[groups.drk]) %>% set("leaves_col", palette()[groups.drk]) plot(hc, main="Total animals" ,horiz=T) legend("topleft", legend=genotypes, col=palette(), pch = rep(c(21,19), length(genotypes)), title="Genotypes") legend("bottomleft", legend=1:k, col=cluster_cols, lty = 1, lwd = 2, title="Drinking group")

And then I am using the cutree function to assess which animal belong to which group in order to plot the water intake average per group.

groups<-cutree(hc, k=k, order_clusters_as_data = FALSE)) x<-cbind(data,groups) intake_avg=aggregate(data[, -ncol(data)], list(x$groups), mean, header=T) df <- melt(intake_avg, id.vars = "Group.1") ggplot(df, aes(variable, value, group=factor(Group.1))) + geom_line(aes(color=factor(Group.1)))

The problem is that I am having an incongruity between the numbers I get from the hierarchical cluster an the number assigned by the cutree function. While the cluster is ordering the branches bottom up from 1 to 4, the cutree function is using some other ordering parameter which I am not familiar with. Because of that, the labels in the cluster plot and in the intake graph plot don't match.

I am very beginner in coding, so for sure I am using too many redundant lines and loops and so my code could be shortened, but if you guys could help me figure out this specific issue I would be very glad.

Data set

Cluster:

Intake graph

最满意答案

要在树形图中绘制相同的簇,您需要使用:

groups <- dendextend:::cutree(hc, k=k, order_clusters_as_data = FALSE) idx <- match(rownames(data), names(groups)) x <- cbind(data,groups[idx]) intake_avg <- aggregate(data[, -ncol(data)], list(x$groups), mean, header=T) df <- melt(intake_avg, id.vars = "Group.1") ggplot(df, aes(variable, value, group=factor(Group.1))) + geom_line(aes(color=factor(Group.1)), lwd=1)

这是进气图:

在此处输入图像描述

To get the same clusters plotted in the dendrogram, you need to use:

groups <- dendextend:::cutree(hc, k=k, order_clusters_as_data = FALSE) idx <- match(rownames(data), names(groups)) x <- cbind(data,groups[idx]) intake_avg <- aggregate(data[, -ncol(data)], list(x$groups), mean, header=T) df <- melt(intake_avg, id.vars = "Group.1") ggplot(df, aes(variable, value, group=factor(Group.1))) + geom_line(aes(color=factor(Group.1)), lwd=1)

Here is the intake graph:

enter image description here

更多推荐

本文发布于:2023-04-21 18:28:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/492681aefd5cb8049f738568f0f01789.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:分歧   cluster   cutree   Divergence   branches

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!