将数据表拆分为R中的每小时总计

编程入门行业动态更新时间:2024-10-12 05:45:27

本文介绍了将数据表拆分为R中的每小时总计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有以下数据，其中每一行对应于某个旅行的家庭成员。因为我们在谈论家庭成员，所以这些行的重叠时间可以由第1行和第2行看到。旅行的持续时间以分钟为单位。

I have the following data, where each row corresponds to a member of the household that takes a certain trip. Since we are talking about members of the household, these rows can have overlapping times as seen by row 1 and row 2. The duration of the trip is noted in minutes. The IDX is simply just an index, to make the transformation retraceable.

IDX | ID | Trip | StartDateTime | Duration (in minutes) 1 | 1 | 1 | 2015-01-21 13:00 | 100 2 | 1 | 1 | 2015-01-21 13:00 | 184 3 | 1 | 1 | 2015-01-21 10:00 | 91 4 | 1 | 2 | 2015-01-22 13:00 | 30 5 | 2 | 2 | 2015-01-30 23:00 | 100

现在，我想按id，行程，日期将数据拆分为小时数据，如下所示：

Now I would like to split this data per id, trip, day into hourly data as follows:

IDX | ID | Trip | StartDateTime | Duration (in minutes) 1 | 1 | 1 | 2015-01-21 13:00 | 60 1 | 1 | 1 | 2015-01-21 14:00 | 40

请注意，该组的总持续时间仍为100，类似于第一行。第二， IDX从第一行中获取。但是对于第四行，我们没有超过60分钟的时间，因此不会被拆分。结果：

Notice that the total duration of this group is still 100, similar to the first row. Second, the IDX is taken from the first row. However for the 4th row, we do not have more than 60 minutes so that one will not be split. Resulting:

IDX | ID | Trip | StartDateTime | Duration (in minutes) 4 | 1 | 2 | 2015-01-22 13:00 | 25

最困难的问题现在变成了第五行，实际上是一天！这样，一个将变为：

The hardest problem now becomes the fifth row, that one is actually crossing a day! So that one will become:

IDX | ID | Trip | StartDateTime | Duration (in minutes) 5 | 2 | 2 | 2015-01-30 23:00 | 60 5 | 2 | 2 | 2015-01-31 0:00 | 40

是否可以像这样展开表格？

Is it possible to expand a table like that?

构造表的代码：

library(data.table) data.table(IDX = c(1:5), ID = c(1,1,1,2,2), Trip = c(1,1,1,1,2), StartDateTime = strptime(c("2015-01-21 13:00","2015-01-21 13:00","2015-01-21 10:00","2015-01-22 13:00","2015-01-30 23:00"), format="%Y-%m-%d %H:%M"), Duration = c(100,184,91,30,100) )

更新开始时间可以是像13:12这样的时间，但是我对真正的开始时间并不真正感兴趣，所以实际上是每小时。

Update The starting times can be anything like 13:12, but I am not really interested in the starting time exact, so per hour actually.

因此，如果开始时间不等于整个小时数，例如：

So if the starting time is not equal to whole hours, such as this:

IDX | ID | Trip | StartDateTime | Duration (in minutes) 6 | 3 | 1 | 2015-01-30 23:14 | 67

然后我们得到：

IDX | ID | Trip | StartDateTime | Duration (in minutes) 6 | 3 | 1 | 2015-01-30 23:00 | 46 6 | 3 | 1 | 2015-01-31 0:00 | 11

很抱歉，我没有澄清这部分内容，但是我认为这是一个简单的后处理步骤

I am sorry for not clarifying this part, but I think this is a simple post-processing step from eddi's solution.

谢谢

推荐答案

@eddi的答案，但使用基本的 difftime 而不是lubridate函数：

This is very similar to @eddi's answer, but using base difftime instead of lubridate functions:

# modifying the example: DT[1, StartDateTime := as.POSIXct("2015-01-21 13:12")] DT[,{ t0 = StartDateTime t1 = StartDateTime + Duration*60 h0 = trunc(t0, units="hour") h1 = trunc(t1, units="hour") h = seq(h0, h1, by="hour") nh = length(h) dur = as.difftime(rep("1",nh), format="%H", units="mins") if (h0 < t0) dur[1 ] = difftime(h0 + as.difftime("1", format="%H", units="mins"), t0) if (h1 < t1) dur[nh] = difftime(t1, h1) if (h0 == h1) dur = difftime(t1, t0) list(h = h, dur = dur) }, by=.(IDX, ID, Trip)]

这给出了

IDX ID Trip h dur 1: 1 1 1 2015-01-21 13:00:00 48 mins 2: 1 1 1 2015-01-21 14:00:00 52 mins 3: 2 1 1 2015-01-21 13:00:00 60 mins 4: 2 1 1 2015-01-21 14:00:00 60 mins 5: 2 1 1 2015-01-21 15:00:00 60 mins 6: 2 1 1 2015-01-21 16:00:00 4 mins 7: 3 1 1 2015-01-21 10:00:00 60 mins 8: 3 1 1 2015-01-21 11:00:00 31 mins 9: 4 2 1 2015-01-22 13:00:00 30 mins 10: 5 2 2 2015-01-30 23:00:00 60 mins 11: 5 2 2 2015-01-31 00:00:00 40 mins

更多推荐

将数据表拆分为R中的每小时总计

本文发布于:2023-10-18 17:30:31，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1504880.html