考虑
目标< - vs value< - 1 library(data.table) dt< - as.data.table(head(mtcars))b $ b
所以我试图将列名和一个值作为变量传递到 data.table中的 j code>环境,这将等同于
dt [,vs == 1] [1] FALSE FALSE TRUE TRUE FALSE TRUE如果只有值是变量, / p>
dt [,vs == value] #[1] FALSE FALSE TRUE TRUE FALSE TRUE当它是一个变量时,我们也可以调用data.table范围内的列
dt [,target,with = FALSE] #vs #1:0 #2:0 # 3:1 #4:1 #5:0 #6:1b $ b
但我无法想象如何以简单的方式结合两者:
我知道我可以简单地做:
dt [[target]] == value #[1] FALSE FALSE TRUE TRUE FALSE TRUE但我需要在数据表范围内可以通过引用修改其他列,例如
dt [,NEWCOL:= sum(vs == 1),by = am ]这里是我的尝试,当列名和值都是变量
dt [,target == value,with = FALSE] #空数据表(0行和0列) dt [,target == value] #[1] FALSE dt [,(target)== value] #[1] FALSE dt [,。 == value)] #V1 #1:FALSE dt [,eval(target)== value] #[1] FALSE dt [target %in%value] ## 11个cols的空数据表(0行):mpg,cyl,disp,hp,drat,wt ... pre>最终我想出了
dt [,.SD [[target]] == value] #[1] FALSE FALSE TRUE TRUE FALSE TRUEset.seed(123)n < - 1e6 dt< - data.table(vs = sample(1L:30L,n,replace = TRUE),am = seq_len(n)) system.time(dt [,NEWCOL:= sum .SD [[target]] == value),by = am])#用户系统已过#13.00 0.02 13.12 system.time(dt [,NEWCOL2:= sum == value),by = am])#用户系统已过#0.82 0.00 0.83
问题:有没有更好的方法这样做,我在这里缺少?
最初我正在寻找一个惯用的东西,所以我认为@GGrothendieck简单的解决方案使用 get 是一个,但令人惊讶的是所有@Richard版本是跳动甚至
set.seed(123) n< - 1e7 dt< - data.table(vs = sample(1L:30L,n,replace = TRUE),am = seq_len(n)) cl< - substitute(x == y, list(x = as.name(target),y = value)) cl2& =,as.name(target),value) system.time(dt [,NEWCOL:= sum(vs == value),by = am])# elapsed #0.83 0.00 0.82 system.time(dt [,NEWCOL1:= sum(.SD [[target]] == value),by = am])# #8.97 0.00 8.97 system.time(dt [,NEWCOL2:= sum(get(target)== value),by = am])#用户系统已过 #2.35 0.00 2.37 system.time(dt [,NEWCOL3:= sum(eval(cl)),by = am])#用户系统已过#0.69 0.02 0.71 system.time(dt [,NEWCOL4:= sum(eval(cl2)),by = am])#用户系统已过#0.76 0.00 0.77 system.time ,NEWCOL5:= sum(eval(as.name(target))== value),by = am])#用户系统已过#0.78 0.00 0.78 解决方案这是一个可能的替代方案。
target< - vs value< - 1 dt< - as.data.table(head(mtcars))& / code>在代码方面不一定简单,但我们可以设置一个未评估的调用 cl 定义在 dt 范围之外,这将在数据表的环境中进行计算。
cl< - substitute(x == y, list(x = as.name(target),y = value))$ b $对于较长的表达式,可能需要b但在这种情况下, call()会缩短代码并创建相同的 cl 结果。因此 cl 也可以是
cl < ==,as.name(target),value)现在我们可以评估 cl 里面 dt 。在你的例子里,这似乎很好。
dt [,NEWCOL:= sum(eval(cl)),by = am ] [] #mpg cyl disp hp drat wt qsec vs am gear carb NEWCOL #1:21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1 #2:21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1 #3:22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1 #4:21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2 #5:18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2 #6:18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2 $ p $在考虑了一分钟后,我不确定 value 是否需要替换,因此下面的作品。但是正如David所说的,第一种方法更有效率。 dt [,eval(as.name(target))= = value] #[1] FALSE FALSE TRUE TRUE FALSE TRUE
Consider
target <- "vs" value <- 1 library(data.table) dt <- as.data.table(head(mtcars))So I'm trying to pass both column name and a value as variables into the j expression within data.table environment, something that would be equivalent to
dt[, vs == 1] # [1] FALSE FALSE TRUE TRUE FALSE TRUEIf only the value is the variable, it works nicely
dt[, vs == value] # [1] FALSE FALSE TRUE TRUE FALSE TRUEWe can also call the column within the data.table scope when it's a variable
dt[, target, with = FALSE] # vs # 1: 0 # 2: 0 # 3: 1 # 4: 1 # 5: 0 # 6: 1But I can't figure how to combine the two in a simple manner
Note: I'm well aware that I can simply do:
dt[[target]] == value # [1] FALSE FALSE TRUE TRUE FALSE TRUEBut I need it within the data table scope so I could modify other columns by reference, something like
dt[, NEWCOL := sum(vs == 1), by = am]So here are my tries when both column name and the value are variables
dt[, target == value, with = FALSE] # Null data.table (0 rows and 0 cols) dt[, target == value] # [1] FALSE dt[, (target) == value] # [1] FALSE dt[, .(target == value)] # V1 # 1: FALSE dt[, eval(target) == value] # [1] FALSE dt[target %in% value] ## Empty data.table (0 rows) of 11 cols: mpg,cyl,disp,hp,drat,wt...Eventually I came up with
dt[, .SD[[target]] == value] # [1] FALSE FALSE TRUE TRUE FALSE TRUEbut it is very inefficient, here's a simple benchmark
set.seed(123) n <- 1e6 dt <- data.table(vs = sample(1L:30L, n, replace = TRUE), am = seq_len(n)) system.time(dt[, NEWCOL := sum(.SD[[target]] == value), by = am]) # user system elapsed # 13.00 0.02 13.12 system.time(dt[, NEWCOL2 := sum(vs == value), by = am]) # user system elapsed # 0.82 0.00 0.83Question: Is there any better way of doing this that I'm missing here? Something either more idiomatic or much more efficient
Edit
Originally I was looking for something idiomatic, so I thought @GGrothendieck simple solution using get was the one, but surprisingly all @Richard version are beating even the version that ins't doing any evaluation of the column name
set.seed(123) n <- 1e7 dt <- data.table(vs = sample(1L:30L, n, replace = TRUE), am = seq_len(n)) cl <- substitute( x == y, list(x = as.name(target), y = value) ) cl2 <- call("==", as.name(target), value) system.time(dt[, NEWCOL := sum(vs == value), by = am]) # user system elapsed # 0.83 0.00 0.82 system.time(dt[, NEWCOL1 := sum(.SD[[target]] == value), by = am]) # user system elapsed # 8.97 0.00 8.97 system.time(dt[, NEWCOL2 := sum(get(target) == value), by = am]) # user system elapsed # 2.35 0.00 2.37 system.time(dt[, NEWCOL3 := sum(eval(cl)), by = am]) # user system elapsed # 0.69 0.02 0.71 system.time(dt[, NEWCOL4 := sum(eval(cl2)), by = am]) # user system elapsed # 0.76 0.00 0.77 system.time(dt[, NEWCOL5 := sum(eval(as.name(target)) == value), by = am]) # user system elapsed # 0.78 0.00 0.78解决方案
Here is one possible alternative.
target <- "vs" value <- 1 dt <- as.data.table(head(mtcars))In terms of code it's not necessarily simpler, but we can set up an unevaluated call cl defined outside the scope of dt which is to be evaluated inside the data table's environment.
cl <- substitute( x == y, list(x = as.name(target), y = value) )substitute() might be necessary for longer expressions. But in this case, call() would shorten the code and create the same cl result. And so cl could also be
cl <- call("==", as.name(target), value)Now we can evaluate cl inside dt. On your example this seems to work fine.
dt[, NEWCOL := sum(eval(cl)), by = am][] # mpg cyl disp hp drat wt qsec vs am gear carb NEWCOL # 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1 # 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1 # 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1 # 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2 # 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2 # 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2After thinking about this for a minute, I'm not sure value needed to be substituted, and hence the following also works. But as David notes, the first approach is more time efficient.
dt[, eval(as.name(target)) == value] # [1] FALSE FALSE TRUE TRUE FALSE TRUE
更多推荐
在“data.table”中的`j`表达式中评估列名和目标值
发布评论