问题描述
限时送ChatGPT账号..假设我有一个数据框:
df <- data.frame(group = c('A','A','A','B','B','B','C','C','C'),
time = c(1,2,4,1,2,3,5,7,8),
data = c(5,6,7,8,9,10,1,2,3))
我想要做的是将数据插入序列中缺失的数据框中.所以在上面的例子中,我丢失了时间 = 3 组 A 的数据,组 B 时间 = 4 和组 C 时间 = 6 的数据.我基本上想将 NA 放在数据列的位置.我将如何添加这些额外的行?我需要一个通用的解决方案注意:我编辑了这个问题,因为之前有一个错误我们不能假设每个组只有 4 个观察.
What I want to do is insert data into the data frame where it was missing in the sequence. So in the above example, I'm missing data for time = 3 for group A, and time = 4 for Group B and time =6 for Group C. I would essentially want to put NAs in the place of the data column. How would I go about adding these additional rows? I need a generalized solution NOTE: I EDITED THE QUESTION AS THERE WAS AN ERROR EARLIER WE CANNOT ASSUME THAT THERE WILL BE ONLY 4 OBSERVATIONS FOR EACH GROUP.
目标是:
df <- data.frame(group = c('A','A','A','A','B','B','B','C','C','C','C'),
time = c(1,2,3,4,1,2,3,5,6,7,8),
data = c(5,6,NA,7,8,9,10,1,NA,2,3))
推荐答案
这是使用 data.table
的一个选项.将'data.frame'转换为'data.table'(setDT(df)
),将按'group'分组的数据集从min
扩展为max
of 'time' 并加入 on
'group' 和 'time' 列.
Here is one option using data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), expand the dataset grouped by 'group' from min
to max
of 'time' and join on
the 'group' and 'time' columns.
library(data.table)
setDT(df)[df[, .(time = min(time):max(time)) , by = group], on = c("group", "time")]
# group time data
# 1: A 1 5
# 2: A 2 6
# 3: A 3 NA
# 4: A 4 7
# 5: B 1 8
# 6: B 2 9
# 7: B 3 10
# 8: C 5 1
# 9: C 6 NA
#10: C 7 2
#11: C 8 3
这篇关于扩展缺少行的长格式时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论