考虑这样一个碎片化的数据集:
Consider a fragmented dataset like this:
ID Date Value 1 1 2012-01-01 5065 4 1 2012-01-04 1508 5 1 2012-01-05 9489 6 1 2012-01-06 7613 7 2 2012-01-07 6896 8 2 2012-01-08 2643 11 3 2012-01-02 7294 12 3 2012-01-03 8726 13 3 2012-01-04 6262 14 3 2012-01-05 2999 15 3 2012-01-06 10000 16 3 2012-01-07 1405 18 3 2012-01-09 8372请注意,对于 (2,3,9,10,17) 缺少观察.我想要的是用Value"= 0 来填充数据集中的一些空白,如下所示:
Notice that observations are missing for (2,3,9,10,17). What I would like, is to fill out some of these gaps in the dataset with "Value" = 0, like so:
ID Date Value 1 1 2012-01-01 5920 2 1 2012-01-02 0 3 1 2012-01-03 0 4 1 2012-01-04 8377 5 1 2012-01-05 7810 6 1 2012-01-06 6452 7 2 2012-01-07 3483 8 2 2012-01-08 5426 9 2 2012-01-09 0 11 3 2012-01-02 7854 12 3 2012-01-03 1948 13 3 2012-01-04 7141 14 3 2012-01-05 5402 15 3 2012-01-06 6412 16 3 2012-01-07 7043 17 3 2012-01-08 0 18 3 2012-01-09 3270关键是只有在对相同(分组)ID 有过去的观察时才应该插入零.我想避免任何循环,因为完整的数据集非常大.
The point is that the zeros only should be inserted if there is a past observation for the same (grouped) ID. I would like to avoid any loops, as the full dataset is quite large.
有什么建议吗?重现数据框:
Any suggestions? To reproduce the dataframe:
df <- data.frame(matrix(0, nrow = 18, ncol = 3, dimnames = list(NULL, c("ID","Date","Value"))) ) df[,1] = c(1,1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,3,3) df[,2] = seq(as.Date("2012-01-01"), as.Date("2012-01-9"), by=1) df[,3] = sample(1000:10000,18,replace=T) df = df[-c(2,3,9,10,17),] 推荐答案这里已经有一些可靠的答案,但我建议查看软件包 padr.
There are already some solid answers here, but I would recommend checking out the package padr.
library(dplyr) library(padr) df %>% pad(start_val = as.Date("2012-01-01"), end_val = as.Date("2012-01-09"), group = "ID") %>% fill_by_value(Value)该包还提供了一些非常直观的函数来汇总日期列.
The package gives some pretty intuitive functions for summarizing Date columns as well.
更多推荐
在 R 的数据框中插入带零的行
发布评论