使用data.table在日期范围内汇总/合并

编程入门 行业动态 更新时间:2024-10-24 10:15:23
本文介绍了使用data.table在日期范围内汇总/合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

假设我有两个data.tables:

Suppose I have two data.tables:

summary <- data.table(period = c("A","B","C","D"), from_date = ymd(c("2017-01-01", "2017-01-03", "2017-02-08", "2017-03-07")), to_date = ymd(c("2017-01-31", "2017-04-01", "2017-03-08", "2017-05-01")) ) log <- data.table(date = ymd(c("2017-01-03","2017-01-20","2017-02-01","2017-03-03", "2017-03-15","2017-03-28","2017-04-03","2017-04-23")), event1 = c(4,8,8,4,3,4,7,3), event2 = c(1,8,7,3,8,4,6,3))

如下所示:

> summary period from_date to_date 1: A 2017-01-01 2017-01-31 2: B 2017-01-03 2017-04-01 3: C 2017-02-08 2017-03-08 4: D 2017-03-07 2017-05-01 > log date event1 event2 1: 2017-01-03 4 1 2: 2017-01-20 8 8 3: 2017-02-01 8 7 4: 2017-03-03 4 3 5: 2017-03-15 3 8 6: 2017-03-28 4 4 7: 2017-04-03 7 6 8: 2017-04-23 3 3

我想得到<表 summary 中每个时间段的em> event1 和 event2 。

I would like to get the sum of event1 and event2 for each time period in the table summary.

我知道我可以这样做:

summary[, c("event1","event2") := .(sum(log[date>=from_date & date<=to_date, event1]), sum(log[date>=from_date & date<=to_date, event2])) , by=period][]

以获得所需结果:

period from_date to_date event1 event2 1: A 2017-01-01 2017-01-31 12 9 2: B 2017-01-03 2017-04-01 31 31 3: C 2017-02-08 2017-03-08 4 3 4: D 2017-03-07 2017-05-01 17 21

现在,在我的现实生活中,我有大约30列总结一下,以后我可能要更改,而 summary 有〜35,000行, log 有〜40,000,000行。

Now, in my real-life problem, I have about 30 columns to be summed, which I may want to change later, and summary has ~35,000 rows, log has ~40,000,000 rows. Is there an efficient way to achieve this?

注意:这是我在这里的第一篇文章。我希望我的问题足够清楚明确,请提出建议,如果我有什么需要改进的地方。谢谢!

Note: This is my first post here. I hope my question is clear and specific enough, please do make suggestions if there is anything I should do to improve the question. Thanks!

推荐答案

是的,您可以执行非公平联接。

(请注意,我已将 log 和 summary 更改为 Log 和 Summary ,因为原始文件已在R中起作用。)

(Note I've changed log and summary to Log and Summary as the originals are already functions in R.)

Log[Summary, on = c("date>=from_date", "date<=to_date"), nomatch=0L, allow.cartesian = TRUE][, .(from_date = date[1], to_date = date.1[1], event1 = sum(event1), event2 = sum(event2)), keyby = "period"]

要对列模式求和,请使用 lapply 与 .SD :

To sum over a pattern of columns, use lapply with .SD:

joined_result <- Log[Summary, on = c("date>=from_date", "date<=to_date"), nomatch = 0L, allow.cartesian = TRUE] cols <- grep("event[a-z]?[0-9]", names(joined_result), value = TRUE) joined_result[, lapply(.SD, sum), .SDcols = cols, keyby = .(period, from_date = date, to_date = date.1)]

更多推荐

使用data.table在日期范围内汇总/合并

本文发布于:2023-10-24 19:38:51,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1524817.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:范围内   日期   data   table

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!