与dplyr等长的组(Groups of equal length with dplyr)
我有df:
df <- data.frame(group = c(rep("G1",18), rep("G2", 10)), X = c(rep("a", 10), rep("b", 8), rep("c", 4), rep("d", 6)), Y = c(rep(1:10), rep(1:8), rep(1:4), rep(1:6)))可能通过使用dplyr或tidyr ,我想让每个组内的所有子group具有相同的长度,这应该是组中元素中最小的一个。 简而言之,结果数据框应该是:
df_r <- data.frame(group = c(rep("G1",16), rep("G2", 8)), X = c(rep("a", 8), rep("b", 8), rep("c", 4), rep("d", 4)), Y = c(rep(1:8), rep(1:8), rep(1:4), rep(1:4)))我无法专注于如何实现这一点。 任何帮助将不胜感激。
I have the df:
df <- data.frame(group = c(rep("G1",18), rep("G2", 10)), X = c(rep("a", 10), rep("b", 8), rep("c", 4), rep("d", 6)), Y = c(rep(1:10), rep(1:8), rep(1:4), rep(1:6)))Possibly by making use of dplyr or tidyr, I would like to make all subgroups within each group the same length, which should be the smallest one among those of the elements of the group. Simply put, the resulting dataframe should be:
df_r <- data.frame(group = c(rep("G1",16), rep("G2", 8)), X = c(rep("a", 8), rep("b", 8), rep("c", 4), rep("d", 4)), Y = c(rep(1:8), rep(1:8), rep(1:4), rep(1:4)))I cannot focus how I would achieve that. Any help would be greatly appreciated.
最满意答案
这可能是你想要的?
library(dplyr) df_r <- df %>% group_by(group, X) %>% mutate(maxY = max(Y)) %>% group_by(group) %>% filter(Y <= min(maxY)) %>% select(group, X, Y) > df_r group X Y 1 G1 a 1 2 G1 a 2 3 G1 a 3 4 G1 a 4 5 G1 a 5 6 G1 a 6 7 G1 a 7 8 G1 a 8 9 G1 b 1 10 G1 b 2 11 G1 b 3 12 G1 b 4 13 G1 b 5 14 G1 b 6 15 G1 b 7 16 G1 b 8 17 G2 c 1 18 G2 c 2 19 G2 c 3 20 G2 c 4 21 G2 d 1 22 G2 d 2 23 G2 d 3 24 G2 d 4 > df_r1 <- data.frame(group = c(rep("G1",16), rep("G2", 8)), X = c(rep("a", 8), rep("b", 8), rep("c", 4), rep("d", 4)), Y = c(rep(1:8), rep(1:8), rep(1:4), rep(1:4))) > identical(df_r, df_r1) [1] TRUEThis might be what you want?
library(dplyr) df_r <- df %>% group_by(group, X) %>% mutate(maxY = max(Y)) %>% group_by(group) %>% filter(Y <= min(maxY)) %>% select(group, X, Y) > df_r group X Y 1 G1 a 1 2 G1 a 2 3 G1 a 3 4 G1 a 4 5 G1 a 5 6 G1 a 6 7 G1 a 7 8 G1 a 8 9 G1 b 1 10 G1 b 2 11 G1 b 3 12 G1 b 4 13 G1 b 5 14 G1 b 6 15 G1 b 7 16 G1 b 8 17 G2 c 1 18 G2 c 2 19 G2 c 3 20 G2 c 4 21 G2 d 1 22 G2 d 2 23 G2 d 3 24 G2 d 4 > df_r1 <- data.frame(group = c(rep("G1",16), rep("G2", 8)), X = c(rep("a", 8), rep("b", 8), rep("c", 4), rep("d", 4)), Y = c(rep(1:8), rep(1:8), rep(1:4), rep(1:4))) > identical(df_r, df_r1) [1] TRUE更多推荐
发布评论