首页 > 编程入门文章详情

R：选择data.table中的特定行

编程入门行业动态更新时间:2024-10-18 18:19:29

本文介绍了R：选择data.table中的特定行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

在选择data.table中的行时，我有一个特定的问题，但到目前为止还没有解决。我有一个数据集，用于存储一系列参数的模拟结果。数据集中的列包含参数或结果值，请参见下面的代码（ p代表参数列， v代表值列。

＃为演示参数创建数据集<-expand.grid（seq（0,0.5，by = .1）， seq（1,10）， seq（100,105）），个字母[1：4]，个字母[10:14]） colnames（params）<-paste（ p，1：5，sep =） data<-data.table（cbind（params，runif（nrow（params）），rnorm（nrow（params））））） setnames（data，c（colnames（params）， v1 ， v2））

我现在要提取：对于每个p1，对于给定p2和p3的值，以及p4，p5的任意值，其中v1的值最小的行令np4和np5为p4和p5的唯一值的数目，对于每个唯一p1并给定p2，p3，我想从np4 * np5行中选择，其中p1，p2，p3与其中v1最小的那一行匹配。然后，所需的输出应该是一个具有从原始表中选择的np1行的表，即包含原始表所做的所有变量。我知道如何从data.table中选择行，如何使用表达式和 by，但是我还没有设法将所有这些组合在一起以产生所需的结果。

<更新：我找到了答案。诀窍是，如何在 by创建的子集中选择最佳行？（当然，已经有一个内置的）解决方案：

np4<-c（ a， b） np5<-c（ m， n） ss2 <- data [p4％in％np4& p5％in％np5， .SD [which（v1 == min（v1）），]， by = p1]

来自data.table文档：

.SD是一个data.table，其中包含每个组x的数据子集，不包括（或keyby）使用的任何列。

解决方案

这应该有效

np4<-c（ a ， b） np5<-c（ m， n） data [p4％in％np4& p5％in％np5， list（v1 = min（v1），v2 = v2 [which.min（v1）]）， by = c（ p1， p2， p3， p4， p5）]

I have a bit of a specific problem of selecting rows in a data.table, and so far not managed to solve it. I have a dataset storing simulation results over a range of parameters. Columns in the dataset either contain parameters or result values, see code below ("p" for parameter columns and "v" for value columns.

# create dataset for demonstration params <- expand.grid (seq(0,0.5,by=.1), seq(1,10), seq(100,105), letters[1:4], letters[10:14]) colnames(params) <- paste("p",1:5,sep="") data <- data.table(cbind(params,runif(nrow(params)),rnorm(nrow(params)))) setnames(data, c(colnames(params),"v1","v2"))

I would now like to extract: for each p1, and for given values of p2 and p3,and for arbitrary values of p4, p5, the row where the value of v1 is minimal. Let np4 and np5 be the number of unique values of p4 and p5, for each unique p1 and given p2, p3, I would like to select among the np4*np5 rows where p1, p2, p3 match that one row where v1 is minimal. The desired output should then be a table with np1 rows selected from the original table, i.e. containing all variables the original did. I know how to select rows from a data.table, how to use expressions and "by", but I have not managed to put that all together to produce the desired result.

UPDATE: I found the answer. The trick was, how to select the optimal row within the subset created by "by? (Of course, there was already a built-in) solution:

np4 <- c("a", "b") np5 <- c("m", "n") ss2 <- data[ p4 %in% np4 & p5 %in% np5, .SD[which(v1==min(v1)),], by = "p1"]

From the data.table documentation:

.SD is a data.table containing the Subset of x's Data for each group, excluding any columns used in by (or keyby).

解决方案

This should work

np4 <- c("a", "b") np5 <- c("m", "n") data[p4 %in% np4 & p5 %in% np5, list(v1 = min(v1), v2 = v2[which.min(v1)]), by = c("p1", "p2", "p3", "p4", "p5")]

更多推荐

R：选择data.table中的特定行

本文发布于:2023-11-22 06:28:15，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1616319.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

data table

上一篇：检查选择查询中的零行
下一篇：在data.table中按组标记随机选择的N行

发布评论取消回复

评论列表（有 0 条评论）

热门文章