在R中,我如何拆分&将带有ID的时间戳记间隔数据聚合到常规时隙中?

编程入门 行业动态 更新时间:2024-10-25 15:36:18
本文介绍了在R中,我如何拆分&将带有ID的时间戳记间隔数据聚合到常规时隙中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在按照上一个问题.乔恩·斯普林(Jon Spring)向我指出了一种在给定时间间隔内指示 active 事件数量的解决方案.

I'm working on next step of my data aggregation following previous question. There Jon Spring pointed me to a solution for indicating number of active events in given time interval.

下一步,我希望能够汇总此数据并获取在固定时间间隔内任何时间处于活动状态的具有相同ID的观测值.

At next step I'd like to be able to aggregate this data and obtain number of observations with same ID that were active at any point during the fixed time interval.

从具有七个ID的七个事件的玩具数据集开始:

Starting with a toy dataset of seven events with five IDs:

library(tidyverse); library(lubridate) df1 <- tibble::tibble( id = c("a", "b", "c", "c", "c", "d", "e"), start = c(ymd_hms("2018-12-10 13:01:00"), ymd_hms("2018-12-10 13:07:00"), ymd_hms("2018-12-10 14:45:00"), ymd_hms("2018-12-10 14:48:00"), ymd_hms("2018-12-10 14:52:00"), ymd_hms("2018-12-10 14:45:00"), ymd_hms("2018-12-10 14:45:00")), end = c(ymd_hms("2018-12-10 13:05:00"), ymd_hms("2018-12-10 13:17:00"), ymd_hms("2018-12-10 14:46:00"), ymd_hms("2018-12-10 14:50:00"), ymd_hms("2018-12-10 15:01:00"), ymd_hms("2018-12-10 14:51:00"), ymd_hms("2018-12-10 15:59:00")))

我可以在数据帧的每一行上进行蛮力循环,并将每条记录扩展"到指定的间隔,该间隔涵盖从开始到结束的时间段,此处使用15分钟:

I could bruteforce loop over each line of data frame and 'expand' each record to specified intervals that cover time period from start to end, here using 15 minutes:

for (i in 1:nrow(df1)) { right <- df1 %>% slice(i) %>% mutate(start_floor = floor_date(start, "15 mins")) left <- tibble::tibble( timestamp = seq.POSIXt(right$start_floor, right$end, by = "15 mins"), id = right$id) if (i == 1){ result <- left } else { result <- bind_rows(result, left) %>% distinct() } }

然后通过简单的聚合即可获得最终结果:

Then it's a matter of simple aggregation to obtain final result:

result_agg <- result %>% group_by(timestamp) %>% summarise(users_mac = n())

这给出了理想的结果,但可能无法很好地扩展到我需要用于它的数据集(目前约有700万条记录..并且还在不断增长).

That gives desired result, but will probably not scale well to dataset I need to use it with (~7 millions records at the moment.. and growing).

有没有更好的解决方案来解决这个问题?

Is there any better solution to this problem?

推荐答案

使用 tsibble 包可以实现整洁的解决方案.

A tidy solution could be achieved using the tsibble package.

library(tidyverse) #> Registered S3 methods overwritten by 'ggplot2': #> method from #> [.quosures rlang #> c.quosures rlang #> print.quosures rlang #> Registered S3 method overwritten by 'rvest': #> method from #> read_xml.response xml2 library(lubridate) #> #> Attaching package: 'lubridate' #> The following object is masked from 'package:base': #> #> date library(tsibble, warn.conflicts = FALSE) df1 <- tibble( id = c("a", "b", "c", "c", "c", "d", "e"), start = c(ymd_hms("2018-12-10 13:01:00"), ymd_hms("2018-12-10 13:07:00"), ymd_hms("2018-12-10 14:45:00"), ymd_hms("2018-12-10 14:48:00"), ymd_hms("2018-12-10 14:52:00"), ymd_hms("2018-12-10 14:45:00"), ymd_hms("2018-12-10 14:45:00")), end = c(ymd_hms("2018-12-10 13:05:00"), ymd_hms("2018-12-10 13:17:00"), ymd_hms("2018-12-10 14:46:00"), ymd_hms("2018-12-10 14:50:00"), ymd_hms("2018-12-10 15:01:00"), ymd_hms("2018-12-10 14:51:00"), ymd_hms("2018-12-10 15:59:00"))) df1 %>% mutate( start = floor_date(start, "15 mins"), end = floor_date(end, "15 mins") ) %>% gather("label", "index", start:end) %>% distinct(id, index) %>% mutate(date = as_date(index)) %>% as_tsibble(key = c(id, date), index = index) %>% fill_gaps() %>% index_by(index) %>% summarise(users_mac = n()) #> # A tsibble: 7 x 2 [15m] <UTC> #> index users_mac #> <dttm> <int> #> 1 2018-12-10 13:00:00 2 #> 2 2018-12-10 13:15:00 1 #> 3 2018-12-10 14:45:00 3 #> 4 2018-12-10 15:00:00 2 #> 5 2018-12-10 15:15:00 1 #> 6 2018-12-10 15:30:00 1 #> 7 2018-12-10 15:45:00 1

由 reprex软件包(v0.2.1)于2019-05-17创建

更多推荐

在R中,我如何拆分&amp;将带有ID的时间戳记间隔数据聚合到常规时隙中?

本文发布于:2023-10-13 09:43:50,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1487600.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:戳记   间隔   常规   时间   数据

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!