根据第一列的日期和时间添加第二列的值

编程入门 行业动态 更新时间:2024-10-11 19:16:24
本文介绍了根据第一列的日期和时间添加第二列的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个带有2个变量的数据框。第一列 X表示日期和时间,格式为dd / mm / yyyy hh:mm,第二列 Y中的值是电表读数,每5分钟后获取一次。现在,我想添加每个半小时的值。例如

I have a data frame with 2 variables. the first column "X" represents date and time with format dd/mm/yyyy hh:mm, the values in the second column "Y" are the electricity meter reading which are taken each after 5 minutes. Now I want to add the values of each half an hour. For instance

X Y 13/12/2014 12:00 1 13/12/2014 12:05 2 13/12/2014 12:10 1 13/12/2014 12:15 2 13/12/2014 12:20 2 13/12/2014 12:25 1

最后,我想将结果表示为:

At the end i want to present a result as:

13/12/2014 12:00 9 13/12/2014 12:30 12

等等...

推荐答案

这是另一种方法,实际上需要计算 X (根据 OP评论)。

Here's an alternative approach which actually takes X in count (as per OP comment).

首先,我们将确保 X 是正确的 POSIXct 格式,以便我们可以正确地操作它(为了方便起见,我在这里使用 data.table 包)

First, we will make sure X is of proper POSIXct format so we could manipulate it correctly (I'm using the data.table package here for convenience)

library(data.table) setDT(df)[, X := as.POSIXct(X, format = "%d/%m/%Y %R")]

然后,我们将汇总 00 或 30 ,同时对 Y 求和并提取每个组的 X 的第一个值。我制作了一个更复杂的数据集,以说明更复杂的情况(见下文)

Then, we will aggregate per cumulative minutes instances of 00 or 30 within X while summing Y and extracting the first value of X per each group. I've made a more complicated data set in order illustrate more complicated scenarios (see below)

df[order(X), .(X = X[1L], Y = sum(Y)), by = cumsum(format(X, "%M") %in% c("00", "30"))] # cumsum X Y # 1: 0 2014-12-13 12:10:00 6 # 2: 1 2014-12-13 12:30:00 6 # 3: 2 2014-12-13 13:00:00 3

数据

df <- read.table(text = "X Y '13/12/2014 12:10' 1 '13/12/2014 12:15' 2 '13/12/2014 12:20' 2 '13/12/2014 12:25' 1 '13/12/2014 12:30' 1 '13/12/2014 12:35' 1 '13/12/2014 12:40' 1 '13/12/2014 12:45' 1 '13/12/2014 12:50' 1 '13/12/2014 12:55' 1 '13/12/2014 13:00' 1 '13/12/2014 13:05' 1 '13/12/2014 13:10' 1", header = TRUE)

一些解释

  • by expr ession:
    • format(X,%M)获取分钟数$ c> X (请参阅?strptime )
    • 下一步是检查它们是否与 00 或 30 (使用%in%)
    • cumsum 将这些匹配的值分成单独的组,我们通过将该表达式放入 by 语句(请参见?data.table )
    • The by expression:
      • format(X, "%M") gets the minutes out of X (see ?strptime)
      • Next step is check if they match 00 or 30 (using %in%)
      • cumsum separates these matched values into separate groups which we aggregate by by putting this expression into the by statement (see ?data.table)
      • (X = X [1L],Y = sum(Y) )只是获得每个组的 X 的第一个值和 Y 的总和每个组。
      • (X = X[1L], Y = sum(Y)) is simply getting the first value of X per each group and the sum of Y per each group.
      • 我添加了 order(X),以确保数据集按日期正确排序(我将 X 转换为正确的 POSIXct 格式的主要原因)
      • I've added order(X) in order to make sure the data set is properly ordered by date (one of the main reasons I've converted X to proper POSIXct format)

      Fo若要更好地了解 data.table 的工作原理,请参见一些教程此处

      For a better understanding on how data.table works, see some tutorials here

更多推荐

根据第一列的日期和时间添加第二列的值

本文发布于:2023-10-31 09:18:16,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1545705.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:日期   时间

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!