随时间间隔合并记录

编程入门 行业动态 更新时间:2024-10-24 21:34:40
本文介绍了随时间间隔合并记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

首先让我说这个问题与R(状态编程语言)有关,但是我对其他环境提出了直接的建议.

Let me begin by saying this question pertains to R (stat programming language) but I'm open straightforward suggestions for other environments.

目标是将数据帧(df)A的结果合并到df B中的子元素.这是一对多的关系,但是,这是曲折,一旦记录被键匹配他们还必须在由开始时间和持续时间指定的特定时间范围内进行匹配.

The goal is to merge outcomes from dataframe (df) A to sub-elements in df B. This is a one to many relationship but, here's the twist, once the records are matched by keys they also have to match over a specific frame of time given by a start time and duration.

例如,df A中的一些记录:

For example, a few records in df A:

OBS ID StartTime Duration Outcome 1 01 10:12:06 00:00:10 Normal 2 02 10:12:30 00:00:30 Weird 3 01 10:15:12 00:01:15 Normal 4 02 10:45:00 00:00:02 Normal

从df B:

OBS ID Time 1 01 10:12:10 2 01 10:12:17 3 02 10:12:45 4 01 10:13:00

合并所需的结果将是:

OBS ID Time Outcome 1 01 10:12:10 Normal 3 02 10:12:45 Weird

所需结果:具有从A合并的结果的数据框B.由于观察值2和4匹配了A中记录的ID,但它们不在给定的任何时间间隔内,因此删除了观察值2和

Desired result: dataframe B with outcomes merged in from A. Notice observations 2 and 4 were dropped because although they matched IDs on records in A they did not fall within any of the time intervals given.

问题

是否可以在R中执行这种操作,您将如何开始?如果没有,您可以建议替代工具吗?

Is it possible to perform this sort of operation in R and how would you get started? If not, can you suggest an alternative tool?

推荐答案

设置数据

首先设置输入数据帧.我们创建数据帧的两个版本:A和B仅使用字符列作为时间,而At和Bt使用chron包"times"类进行时间(与可以添加和减去它们的类):

First set up the input data frames. We create two versions of the data frames: A and B just use character columns for the times and At and Bt use the chron package "times" class for the times (which has the advantage over "character" class that one can add and subtract them):

LinesA <- "OBS ID StartTime Duration Outcome 1 01 10:12:06 00:00:10 Normal 2 02 10:12:30 00:00:30 Weird 3 01 10:15:12 00:01:15 Normal 4 02 10:45:00 00:00:02 Normal" LinesB <- "OBS ID Time 1 01 10:12:10 2 01 10:12:17 3 02 10:12:45 4 01 10:13:00" A <- At <- read.table(textConnection(LinesA), header = TRUE, colClasses = c("numeric", rep("character", 4))) B <- Bt <- read.table(textConnection(LinesB), header = TRUE, colClasses = c("numeric", rep("character", 2))) # in At and Bt convert times columns to "times" class library(chron) At$StartTime <- times(At$StartTime) At$Duration <- times(At$Duration) Bt$Time <- times(Bt$Time)

带时间类别的sqldf

现在,我们可以使用 sqldf 程序包执行计算了.我们使用method="raw"(不会为输出分配类),因此我们必须自己为输出"Time"列分配"times"类:

Now we can perform the calculation using the sqldf package. We use method="raw" (which does not assign classes to the output) so we must assign the "times" class to the output "Time" column ourself:

library(sqldf) out <- sqldf("select Bt.OBS, ID, Time, Outcome from At join Bt using(ID) where Time between StartTime and StartTime + Duration", method = "raw") out$Time <- times(as.numeric(out$Time))

结果是:

> out OBS ID Time Outcome 1 1 01 10:12:10 Normal 2 3 02 10:12:45 Weird

使用sqldf的开发版本,无需使用method="raw"即可完成操作,并且sqldf类分配试探法会自动将"Time"列设置为"times"类:

With the development version of sqldf this can be done without using method="raw" and the "Time" column will automatically be set to "times" class by the sqldf class assignment heuristic:

library(sqldf) source("sqldf.googlecode/svn/trunk/R/sqldf.R") # grab devel ver sqldf("select Bt.OBS, ID, Time, Outcome from At join Bt using(ID) where Time between StartTime and StartTime + Duration")

具有字符类的sqldf

通过使用sqlite的 strftime 函数.不幸的是,SQL语句涉及更多:

Its actually possible to not use the "times" class by performing all time calculations in sqlite out of character strings employing sqlite's strftime function. The SQL statement is unfortunately a bit more involved:

sqldf("select B.OBS, ID, Time, Outcome from A join B using(ID) where strftime('%s', Time) - strftime('%s', StartTime) between 0 and strftime('%s', Duration) - strftime('%s', '00:00:00')")

一系列修改,修复了语法,增加了其他方法并修复/改进了read.table语句的情况.

A series of edits which fixed grammar, added additional approaches and fixed/improved the read.table statements.

简化/改进的最终sqldf语句.

Simplified/improved final sqldf statement.

更多推荐

随时间间隔合并记录

本文发布于:2023-10-13 09:47:34,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1487609.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:间隔

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!