我有两个包含许多字段的 data.tables.
I have two data.tables with many fields.
我想连接这两个表,添加一些计算字段并附加来自第一个、第二个或两个表的所有其他字段(类似于 SQL 的 select a+b AS sum, DT1.*, DT2.* FROM...) 无需输入所有字段名称.
I want to join the two tables, add some calculated fields and append all other fields from the first, second or both tables (similar to SQL's select a+b AS sum, DT1.*, DT2.* FROM...) without typing all the field names.
我该怎么做(关于最简单的语法和最佳性能)?
How can I do this (regarding easiest syntax and best performance)?
简化示例数据:
library(data.table) DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5) DT2 = data.table(x=c("d", "c", "b"), b=6:8)现在我想加入表格并添加计算字段:
Now I want to join the tables and add a calculated field:
DT1[DT2, .(sum=a + b, <<< how to say DT1.*, DT2.* here? >>> ), on="x"]2016 年 5 月 4 日更新:受用户 jangorecki 的启发,我找到了一个功能请求:
Update May 4, 2016: Inspired by user jangorecki I have found a feature request for this:
在加入时应该可以参考我的.SDp>
推荐答案这应该可以准确地满足您的需求.它使用非常强大的 R 功能,称为计算语言(或元编程),在官方 R 语言定义 手册.这是 R 语言的一个特殊特性,IMO 不应忘记.
This should precisely answer your need. It uses very powerful R feature called computing on the language (or meta programming) well described in official R Language Definition manual. This is an exceptional feature of R language and should not be forgotten IMO.
library(data.table) DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5) DT2 = data.table(x=c("d", "c", "b"), b=6:8) jj = as.call(c( list(as.name(".")), list(sum = quote(a+b)), lapply(unique(c(names(DT1), names(DT2))), as.name) )) print(jj) #.(sum = a + b, x, a, b) DT1[DT2, eval(jj), on="x"] # sum x a b #1: NA d NA 6 #2: 8 c 1 7 #3: 11 b 3 8 #4: 13 b 5 8更多推荐
R data.table 连接:SQL“选择 *"连接表中的类似语法?
发布评论