根据列的非定向组合选择行

编程入门行业动态更新时间:2024-10-26 22:17:20

本文介绍了根据列的非定向组合选择行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我试图根据前两列中值的组合来选择数据框第三列中的最大值.

I am trying to select the maximum value in a dataframe's third column based on the combinations of the values in the first two columns.

我的问题类似于这一个但我找不到实现我所需要的方法.

My problem is similar to this one but I can't find a way to implement what I need.

示例数据已更改，以使列名更加明显.

Sample data changed to make the column names more obvious.

以下是一些示例数据:

library(tidyr) set.seed(1234) df <- data.frame(group1 = letters[1:4], group2 = letters[1:4]) df <- df %>% expand(group1, group2) df <- subset(df, subset = group1!=group2) df$score <- runif(n = 12,min = 0,max = 1) df # A tibble: 12 × 3 group1 group2 score <fctr> <fctr> <dbl> 1 a b 0.113703411 2 a c 0.622299405 3 a d 0.609274733 4 b a 0.623379442 5 b c 0.860915384 6 b d 0.640310605 7 c a 0.009495756 8 c b 0.232550506 9 c d 0.666083758 10 d a 0.514251141 11 d b 0.693591292 12 d c 0.544974836

在此示例中，第1行和第4行是重复项".我想选择第4行，因为得分列中的值大于第1行中的值.最终，我希望返回一个数据帧，其中包含group1和group2列以及得分列中的最大值.因此，在此示例中，我希望返回6行.

In this example rows 1 and 4 are 'duplicates'. I would like to select row 4 as the value in the score column is larger than in row 1. Ultimately I would like a dataframe to be returned with the group1 and group2 columns and the maximum value in the score column. So in this example, I expect there to be 6 rows returned.

如何在R中做到这一点?

How can I do this in R?

推荐答案

我希望分两步处理此问题:

I'd prefer dealing with this problem in two steps:

library(dplyr) # Create function for computing group IDs from data frame of groups (per column) get_group_id <- function(groups) { apply(groups, 1, function(row) { paste0(sort(row), collapse = "_") }) } group_id <- get_group_id(select(df, -score)) # Perform the computation df %>% mutate(groupId = group_id) %>% group_by(groupId) %>% slice(which.max(score)) %>% ungroup() %>% select(-groupId)

更多推荐

根据列的非定向组合选择行

本文发布于:2023-10-23 16:05:56，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1521330.html