我试图根据前两列中值的组合来选择数据框第三列中的最大值.
I am trying to select the maximum value in a dataframe's third column based on the combinations of the values in the first two columns.
我的问题类似于这一个但我找不到实现我所需要的方法.
My problem is similar to this one but I can't find a way to implement what I need.
示例数据已更改,以使列名更加明显.
Sample data changed to make the column names more obvious.
以下是一些示例数据:
library(tidyr) set.seed(1234) df <- data.frame(group1 = letters[1:4], group2 = letters[1:4]) df <- df %>% expand(group1, group2) df <- subset(df, subset = group1!=group2) df$score <- runif(n = 12,min = 0,max = 1) df # A tibble: 12 × 3 group1 group2 score <fctr> <fctr> <dbl> 1 a b 0.113703411 2 a c 0.622299405 3 a d 0.609274733 4 b a 0.623379442 5 b c 0.860915384 6 b d 0.640310605 7 c a 0.009495756 8 c b 0.232550506 9 c d 0.666083758 10 d a 0.514251141 11 d b 0.693591292 12 d c 0.544974836在此示例中,第1行和第4行是重复项".我想选择第4行,因为得分列中的值大于第1行中的值.最终,我希望返回一个数据帧,其中包含group1和group2列以及得分列中的最大值.因此,在此示例中,我希望返回6行.
In this example rows 1 and 4 are 'duplicates'. I would like to select row 4 as the value in the score column is larger than in row 1. Ultimately I would like a dataframe to be returned with the group1 and group2 columns and the maximum value in the score column. So in this example, I expect there to be 6 rows returned.
如何在R中做到这一点?
How can I do this in R?
推荐答案我希望分两步处理此问题:
I'd prefer dealing with this problem in two steps:
library(dplyr) # Create function for computing group IDs from data frame of groups (per column) get_group_id <- function(groups) { apply(groups, 1, function(row) { paste0(sort(row), collapse = "_") }) } group_id <- get_group_id(select(df, -score)) # Perform the computation df %>% mutate(groupId = group_id) %>% group_by(groupId) %>% slice(which.max(score)) %>% ungroup() %>% select(-groupId)更多推荐
根据列的非定向组合选择行
发布评论