按一列分组，对于R中的每对列，在一列中选择具有最小值的行

编程入门行业动态更新时间:2024-10-14 22:13:16

本文介绍了按一列分组，对于R中的每对列，在一列中选择具有最小值的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

困难的问题到短语。这里是我想做的一个例子。我开始的一个例子：

Difficult question to phrase. Here is an example of what I would like to do. An example of what I am starting with:

set.seed(0) dt <- data.table(dr1.d=rnorm(5), dr1.p=abs(rnorm(5, sd=0.08)), dr2.d=rnorm(5), dr2.p=abs(rnorm(5, sd=0.08)), dr3.d=rnorm(5), dr3.p=abs(rnorm(5, sd=0.08)), dr4.d=rnorm(5), dr4.p=abs(rnorm(5, sd=0.08)), sym = paste("sym", c(1,1,1,2,2))) dt dr1.d dr1.p dr2.d dr2.p dr3.d dr3.p dr4.d dr4.p sym 1: 1.2629543 0.1231960034 0.7635935 0.03292087 -0.22426789 0.040288638 -0.2357066 0.09215294 sym 1 2: -0.3262334 0.0742853628 -0.7990092 0.02017788 0.37739565 0.086861549 -0.5428883 0.07937283 sym 1 3: 1.3297993 0.0235776357 -1.1476570 0.07135369 0.13333636 0.055276307 -0.4333103 0.03436105 sym 1 4: 1.2724293 0.0004613738 -0.2894616 0.03485466 0.80418951 0.102767948 -0.6494716 0.09906433 sym 2 5: 0.4146414 0.1923722711 -0.2992151 0.09900307 -0.05710677 0.003738094 0.7267507 0.02234770 sym 2

有关所有成对共享药物（例如， 'dr1'）我想通过'sym'分组行，然后选择每个组中具有最小p值（以'.p'结尾）的行。上述data.table的最终结果是：

For all pairs of columns that share a drug (e.g. 'dr1') I want to group rows by 'sym', then select the row with the smallest p-value (ends in '.p') within each group. The final result of the above data.table would be this:

dr1.d dr1.p dr2.d dr2.p dr3.d dr3.p dr4.d dr4.p sym 1: 1.3297993 0.0235776357 -0.7990092 0.02017788 -0.22426789 0.040288638 -0.4333103 0.03436105 sym 1 2: 1.2724293 0.0004613738 -0.2894616 0.03485466 -0.05710677 0.003738094 0.7267507 0.02234770 sym 2

我已经尝试使用.SD和lapply来完成这个，但我不能封住我的头。谢谢！

I have tried using .SD and lapply to accomplish this, but I can't wrap my head around it. Thank you!

推荐答案

data.table code>是，只要 j 返回一个列表，列表的每个元素将成为结果中的一列。

The most important (and powerful) thing to understand about data.table is that, as long as j returns a list, each element of the list will become a column in the result.

有了这些知识和一些基础R fun，我们可以直接得到这个结果：

With that knowledge and some base R fun, we can get this result directly by doing:

# I'm on v1.9.7, see github/Rdatatable/data.table/wiki/Installation cols1 = grep("d$", names(dt), value=TRUE) cols2 = grep("p$", names(dt), value=TRUE) dt[, Map(`[`, mget(c(cols1,cols2)), lapply(mget(cols2), which.min)), by=sym] # sym dr1.d dr2.d dr3.d dr4.d dr1.p dr2.p # 1: sym 1 1.329799 -0.7990092 -0.22426789 -0.4333103 0.0235776357 0.02017788 # 2: sym 2 1.272429 -0.2894616 -0.05710677 0.7267507 0.0004613738 0.03485466 # dr3.p dr4.p # 1: 0.040288638 0.03436105 # 2: 0.003738094 0.02234770

请参阅小插曲了解详情。

更多推荐

按一列分组,对于R中的每对列,在一列中选择具有最小值的行

本文发布于:2023-10-18 14:58:22，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1504564.html