问题描述
限时送ChatGPT账号..我希望根据此表示法根据另一列中的日期计算数据框(输入)的一列中值的 3 个月滚动总和:
I am looking to calculate a 3 month rolling sum of values in one column of a data frame (Input) based upon the dates in another column as per this reprex:
CusID <- c(1,1,1,1,1,2,2,2)
Date <- c(ymd("2019-01-01"), ymd("2019-02-01"), ymd("2019-03-01"), ymd("2019-04-01"), ymd("2019-05-01"),
ymd("2019-01-06"),ymd("2019-04-07"),ymd("2019-06-15"))
Amount <- c(50,50,100,50,100,200,180,150)
Roll_3Mth <- c(50,100,200,200,250,200,180,330)
Input <- data.frame(CusID, Date, Amount)
Output <- data.frame(CusID, Date, Amount, Roll_3Mth)
在此示例中,我希望按组 (CusID) 计算前 3 个月(包括所评估行的日期值)的滚动总和.在输出数据框中,我给出了预期值.
In this example, I wish to calculate the rolling sum by Group (CusID), over the preceding 3 months (inclusive of the Date value for the row being evaluated). In the Output data frame I give the expected values.
如何最好地在 R/Tidyverse 中实现这一点,而无需扩展以输出每个日期的记录(因为这将为正在评估的时期创建一个非常大的数据框) - 即使用数据列来评估时期,以及不计算固定数量的行.在我的示例中,每个组内的日期间隔不一致.
How best to achieve this in R / Tidyverse without expanding to output a record for every date (as this will create a very large data frame for the periods being evaluated) - i.e. to use the data column for evaluating the period, and not counting back a fixed number of rows. In my example the gaps between dates within each group are not consistent.
Rcpproll 或 Zoo 可以处理这个包吗?
Would a package will Rcpproll or Zoo be able to handle this?
推荐答案
1) zoo 包使用 rollapplyr
和矢量宽度来处理这个问题.宽度的每个元素都设置为要求和的组件数,并且可以使用 findInterval
轻松设置.(如果给 findInterval
一个日期向量作为它的第一个参数,它会为每个这样的日期返回第二个参数中小于它的日期数,必须排序.)
1) The zoo package handles this using rollapplyr
and a vector width. Each element of the width is set to the number of components to sum and that can be readily set using findInterval
. (If findInterval
is given a vector of dates as its first argument it returns for each such date the number of dates in the second argument, which must be sorted, that are less than it.)
library(dplyr)
library(lubridate)
library(zoo)
Input %>%
group_by(CusID) %>%
mutate(Roll_3Mth =
rollapplyr(Amount, width = 1:n() - findInterval( Date %m-% months(3), Date), sum)) %>%
ungroup
给予:
# A tibble: 8 x 4
CusID Date Amount Roll_3Mth
<dbl> <date> <dbl> <dbl>
1 1 2019-01-01 50 50
2 1 2019-02-01 50 100
3 1 2019-03-01 100 200
4 1 2019-04-01 50 200
5 1 2019-05-01 100 250
6 2 2019-01-06 200 200
7 2 2019-04-07 180 180
8 2 2019-06-15 150 330
2) 另一种方法是将 Input 转换为宽格式动物园对象,在这种情况下我们不需要分组.
2) Another approach is to convert Input to a wide form zoo object in which case we don't need the grouping.
z <- read.zoo(Input, split = "CusID", index = "Date")
tt <- time(z)
w <- 1:nrow(z) - findInterval( tt %m-% months(3), tt)
rollsumr(z, w, sum, na.rm = TRUE)
给予:
1 2
2019-01-01 50 0
2019-01-06 50 200
2019-02-01 100 200
2019-03-01 200 200
2019-04-01 200 200
2019-04-07 200 180
2019-05-01 250 180
2019-06-15 150 330
这篇关于根据日期列计算 R 数据框中的滚动总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论