我有几个要按行组合的数据框.在生成的单个数据框中,我想创建一个新变量来标识观察来自哪个数据集.
I have several data frames that I want to combine by row. In the resulting single data frame, I want to create a new variable identifying which data set the observation came from.
# original data frames df1 <- data.frame(x = c(1, 3), y = c(2, 4)) df2 <- data.frame(x = c(5, 7), y = c(6, 8)) # desired, combined data frame df3 <- data.frame(x = c(1, 3, 5, 7), y = c(2, 4, 6, 8), source = c("df1", "df1", "df2", "df2") # x y source # 1 2 df1 # 3 4 df1 # 5 6 df2 # 7 8 df2我怎样才能做到这一点?提前致谢!
How can I achieve this? Thanks in advance!
推荐答案这不是您所要求的,但非常接近.将你的对象放在一个命名列表中并使用 do.call(rbind...)
It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)
> do.call(rbind, list(df1 = df1, df2 = df2)) x y df1.1 1 2 df1.2 3 4 df2.1 5 6 df2.2 7 8请注意,行名称现在反映了源 data.frames.
Notice that the row names now reflect the source data.frames.
另一种选择是制作如下基本功能:
Another option is to make a basic function like the following:
AppendMe <- function(dfNames) { do.call(rbind, lapply(dfNames, function(x) { cbind(get(x), source = x) })) }此函数然后采用您要堆叠"的 data.frame 名称的字符向量,如下所示:
This function then takes a character vector of the data.frame names that you want to "stack", as follows:
> AppendMe(c("df1", "df2")) x y source 1 1 2 df1 2 3 4 df1 3 5 6 df2 4 7 8 df2更新 2:使用gdata"包中的 combine
> library(gdata) > combine(df1, df2) x y source 1 1 2 df1 2 3 4 df1 3 5 6 df2 4 7 8 df2更新 3:使用data.table"中的 rbindlist
现在可以使用的另一种方法是使用data.table"中的 rbindlist 及其 idcol 参数.有了这个,方法可以是:
Update 3: Use rbindlist from "data.table"
Another approach that can be used now is to use rbindlist from "data.table" and its idcol argument. With that, the approach could be:
> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE) .id x y 1: df1 1 2 2: df1 3 4 3: df2 5 6 4: df2 7 8更新 4:使用purrr"中的 map_df
与 rbindlist 类似,您也可以使用来自purrr"的 map_df 和 I 或 c 作为应用于每个列表元素的函数.
Update 4: use map_df from "purrr"
Similar to rbindlist, you can also use map_df from "purrr" with I or c as the function to apply to each list element.
> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src") Source: local data frame [4 x 3] src x y (chr) (int) (int) 1 df1 1 2 2 df1 3 4 3 df2 5 6 4 df2 7 8更多推荐
组合(rbind)数据框并使用原始数据框的名称创建列
发布评论