使用前两列中相同的数字组合检测行，并选择第三列中具有最高数字的行

编程入门行业动态更新时间:2024-10-27 02:25:05

本文介绍了使用前两列中相同的数字组合检测行，并选择第三列中具有最高数字的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个 data.frame ，只有三列，但有数千行。第一列和第二列报告数字ID，其组合表示链接（例如AB等于BA）。

I have a data.frame with only three columns but with many thousands of rows. The first and the second columns report numerical ID, and their combination indicate a link (e.g. A-B equal to B-A).

现在，我想删除所有行是链接的重复项，选择第三列中具有最高值的行。

Now, I'd like to delete all rows that are duplicates for the link, selecting the row with the highest value in the third column.

以下简单示例：

我的输入 data.frame ：

1 2 100 102 100 20000 100 102 23131 10 19 124444 10 15 1244 19 10 1242 10 19 5635 2 1 666 1 2 33 100 110 23

我的目标是获得：

100 102 23131 10 19 124444 10 15 1244 2 1 666 100 110 23

我试图在 R中找到解决方案R ，否则 postgreSQL 也会好的。非常感谢！

I' trying to find the solution in R, otherwise postgreSQL would be fine too. Thanks a lot!

推荐答案

这个想法与此相似。您可以使用 pmin a pmax 创建两个附加列，如下所示：

The idea is similar to this one. You can create two additional columns using pmin an pmax to group as follows:

A data.table 解决方案。但是如果你不想要data.table，那么你仍然可以使用这个想法。但是，很可能你的数据速度比data.table只有R代码的解决方案要快。

A data.table solution. But if you don't want data.table, then you can still use the idea. However, it is highly improbable you get faster than data.table solution with just R code.

# assuming your data.frame is DF require(data.table) DT <- data.table(DF) # get min of V1,V2 on one column and max on other (for grouping) DT[, `:=`(id1=pmin(V1, V2), id2=pmax(V1, V2))] # get max of V3 DT.OUT <- DT[, .SD[which.max(V3), ], by=list(id1, id2)] # remove the id1 and id2 columns DT.OUT[, c("id1", "id2") := NULL] # V1 V2 V3 # 1: 2 1 666 # 2: 100 102 23131 # 3: 10 19 124444 # 4: 10 15 1244 # 5: 100 110 23

更多推荐

使用前两列中相同的数字组合检测行,并选择第三列中具有最高数字的行

本文发布于:2023-10-23 16:05:12，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1521328.html