几天前,我打开了该线程:
A few days ago I opened this thread:
根据列值对行进行分组
我们在其中获得了以下结果:
In which we obtained this result:
df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1), Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1), Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48), ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5))使用:
df <- df %>% group_by(ID) %>% mutate_at(vars(Obs1), funs(ClusterObs1= with(rle(.), rep(cumsum(values == 1), lengths))))现在我必须进行一些修改:
Now I have to make some modifications:
如果控件"的值大于12并且实际"Obs1"值等于1且与先前的"Obs1"值相等,则"DesiredResultClusterObs1"值应加+1
If value of 'Control' is higher than 12 and actual 'Obs1' value is equal to 1 and to previous 'Obs1' value, 'DesiredResultClusterObs1' value should add +1
df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1), Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1), Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48), ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5), DesiredResultClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7))我曾考虑过添加if_else条件,但会带来一些乐趣,但是没有任何想法吗?
I have considered add if_else condition with lag in funs but unsuccessfully, any ideas?
对于许多列,情况如何?
How it would be for many columns?
推荐答案这似乎可行:
df %>% mutate(DesiredResultClusterOrbs1 = with(rle(Control > 12 & Obs1 == 1 & lag(Obs1) == 1), rep(cumsum(values == 1), lengths)) + ClusterObs1) ID Obs1 Control ClusterObs1 DesiredResultClusterOrbs1 1 1 1 0 1 1 2 1 1 3 1 1 3 1 0 3 1 1 4 1 1 1 2 2 5 1 0 12 2 2 6 1 1 1 3 3 7 1 1 1 3 3 8 1 0 1 3 3 9 1 1 36 4 4 10 1 0 13 4 4 11 1 0 1 4 4 12 1 0 1 4 4 13 1 1 2 5 5 14 1 1 24 5 6 15 1 1 2 5 6 16 1 1 2 5 6 17 1 1 48 5 7基本上,我们使用上一个线程中的rle + rep机制,根据条件的TRUE/FALSE结果创建一个累积向量,并将其添加到现有的ClusterObs1中.
Basically, we use the rle+rep mechanic from your previous thread to create a cumulative vector from the TRUE/FALSE result of your conditions and add it to the existing ClusterObs1.
如果要创建多个DesiredResultClusterOrbs,则可以使用mapply.也许有一个dplyr解决方案,但这是基本的R.
If you want to create multiple DesiredResultClusterOrbs, you can use mapply. Maybe there's a dplyr solution for this, but this is base R.
数据:
df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1), Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1), Obs2 = rbinom(17, 1, .5), Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48), ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5)) df <- df %>% mutate_at(vars(Obs2), funs(ClusterObs2= with(rle(.), rep(cumsum(values == 1), lengths))))循环:
newcols <- mapply(function(x, y){ with(rle(df$Control > 12 & x == 1 & lag(x) == 1), rep(cumsum(values == 1), lengths)) + y }, df[2:3], df[5:6])这将产生一个带有新列的矩阵,然后您可以将其重命名并cbind到您的数据:
This produces a matrix with the new columns, which you can then rename and cbind to your data:
colnames(newcols) <- paste0("DesiredResultClusterOrbs", 1:2) cbind.data.frame(df, newcols) ID Obs1 Obs2 Control ClusterObs1 ClusterObs2 DesiredResultClusterOrbs1 DesiredResultClusterOrbs2 1 1 1 1 0 1 1 1 1 2 1 1 1 3 1 1 1 1 3 1 0 0 3 1 1 1 1 4 1 1 0 1 2 1 2 1 5 1 0 0 12 2 1 2 1 6 1 1 0 1 3 1 3 1 7 1 1 1 1 3 2 3 2 8 1 0 0 1 3 2 3 2 9 1 1 1 36 4 3 4 3 10 1 0 1 13 4 3 4 4 11 1 0 0 1 4 3 4 4 12 1 0 1 1 4 4 4 5 13 1 1 1 2 5 4 5 5 14 1 1 0 24 5 4 6 5 15 1 1 1 2 5 5 6 6 16 1 1 1 2 5 5 6 6 17 1 1 1 48 5 5 7 7更多推荐
根据具有条件的列值按组对行进行聚类
发布评论