日期时间戳在R中未正确匹配/合并(date timestamps not matching / merging properly in R)

编程入门 行业动态 更新时间:2024-10-17 21:20:38
日期时间戳在R中未正确匹配/合并(date timestamps not matching / merging properly in R)

我有两个数据框:

LOCA:

Date Time Temperature Voltage Longitude Latitude Deployment DateTime 2014-08-08 14:18:25 20.9 1.872 -126.6886 34.95633 A 2014-08-08 14:20:00 2014-08-08 14:38:25 19.8 1.862 -126.6899 34.95675 A 2014-08-08 14:40:00 2014-08-08 14:58:26 19.3 1.856 -126.6911 34.95732 A 2014-08-08 15:00:00 2014-08-08 15:18:26 18.8 1.872 -126.6921 34.95803 A 2014-08-08 15:20:00 2014-08-08 15:38:25 18.6 1.872 -126.6929 34.95872 A 2014-08-08 15:40:00 2014-08-08 15:58:26 18.4 1.859 -126.6937 34.95921 A 2014-08-08 16:00:00 2014-08-08 16:18:25 18.4 1.875 -126.6944 34.95983 A 2014-08-08 16:20:00 2014-08-08 16:38:25 18.4 1.875 -126.6949 34.96021 A 2014-08-08 16:40:00 2014-08-08 16:58:26 18.4 1.872 -126.6956 34.96033 A 2014-08-08 17:00:00 2014-08-08 17:18:26 18.4 1.872 -126.6962 34.96025 A 2014-08-08 17:20:00

和datA:

FileName Deployment Start_UTC Stop_UTC Duration sum PrAb dolPrAb DASBR1_20140808$140000.wav A 2014-08-08 14:00:00 2014-08-08 14:02:00 119.6000 0 0 0 DASBR1_20140808$142000.wav A 2014-08-08 14:20:00 2014-08-08 14:22:00 119.6000 0 0 0 DASBR1_20140808$144000.wav A 2014-08-08 14:40:00 2014-08-08 14:42:00 119.8000 0 0 0 DASBR1_20140808$150000.wav A 2014-08-08 15:00:00 2014-08-08 15:02:00 119.6000 0 0 0 DASBR1_20140808$152000.wav A 2014-08-08 15:20:00 2014-08-08 15:22:00 119.6000 0 0 0 DASBR1_20140808$154000.wav A 2014-08-08 15:40:00 2014-08-08 15:42:00 119.6000 0 0 0 DASBR1_20140808$160000.wav A 2014-08-08 16:00:00 2014-08-08 16:02:00 119.6000 1 1 1 DASBR1_20140808$162000.wav A 2014-08-08 16:20:00 2014-08-08 16:22:00 119.8000 0 0 0 DASBR1_20140808$164000.wav A 2014-08-08 16:40:00 2014-08-08 16:42:00 119.8000 0 0 0 DASBR1_20140808$170000.wav A 2014-08-08 17:00:00 2014-08-08 17:02:00 119.6000 0 0 0

每个实际上都有几千行。 我想根据locA中的“DateTime”标记和datA中的“Start_UTC”合并它们。 这两个列的格式都与UTC相同。

locA$DateTime=as.POSIXct(paste(locA$Date, time), format="%Y-%m-%d %H:%M:%S", tz="UTC") datA$Start_UTC = as.POSIXct(files$Start_UTC, format="%Y-%m-%d %H:%M:%S", tz="UTC")

在locA中,“time”变量来自一个序列,其中我将locA $ Time中的字符串舍入到最接近的20min。 “time”是具有调整时间的字符串。

str(time) chr [1:8845] "14:00:00" "14:20:00" "14:40:00" "15:00:00" "15:20:00" "15:40:00" "16:00:00" "16:20:00" "16:40:00" "17:00:00" ...

当我使用locA $ DateTime和datA $ Start_UTC合并两个数据帧时,它永远不会正常工作。 我每次都会得到这样的东西。

resA = merge(datA,locA, by.x=c("Start_UTC"), by.y=c("DateTime"))

萨:

Start_UTC FileName Deployment.x Stop_UTC Duration sum PrAb dolPrAb Date Time Temperature Voltage Longitude Latitude Deployment.y 2014-08-08 14:00:00 DASBR1_20140808$140000.wav A 2014-08-08 14:02:00 119.6000 0 0 0 2014-08-08 20:58:26 19.1 1.872 -126.7018 34.94994 A 2014-08-08 14:20:00 DASBR1_20140808$142000.wav A 2014-08-08 14:22:00 119.6000 0 0 0 2014-08-08 21:18:26 19.1 1.872 -126.7027 34.94801 A 2014-08-08 14:40:00 DASBR1_20140808$144000.wav A 2014-08-08 14:42:00 119.8000 0 0 0 2014-08-08 21:38:26 19.1 1.872 -126.7038 34.94608 A 2014-08-08 15:00:00 DASBR1_20140808$150000.wav A 2014-08-08 15:02:00 119.6000 0 0 0 2014-08-08 21:58:27 18.9 1.865 -126.7050 34.94394 A 2014-08-08 15:20:00 DASBR1_20140808$152000.wav A 2014-08-08 15:22:00 119.6000 0 0 0 2014-08-08 22:18:27 18.9 1.856 -126.7062 34.94188 A 2014-08-08 15:40:00 DASBR1_20140808$154000.wav A 2014-08-08 15:42:00 119.6000 0 0 0 2014-08-08 22:38:26 18.8 1.862 -126.7077 34.93994 A 2014-08-08 16:00:00 DASBR1_20140808$160000.wav A 2014-08-08 16:02:00 119.6000 1 1 1 2014-08-08 22:58:26 18.8 1.859 -126.7090 34.93796 A 2014-08-08 16:20:00 DASBR1_20140808$162000.wav A 2014-08-08 16:22:00 119.8000 0 0 0 2014-08-08 23:18:26 18.8 1.865 -126.7105 34.93622 A 2014-08-08 16:40:00 DASBR1_20140808$164000.wav A 2014-08-08 16:42:00 119.8000 0 0 0 2014-08-08 23:38:27 18.6 1.856 -126.7120 34.93480 A 2014-08-08 17:00:00 DASBR1_20140808$170000.wav A 2014-08-08 17:02:00 119.6000 0 0 0 2014-08-09 23:58:28 18.8 1.849 -126.8051 34.88381 A

我尝试过这么多不同的方法。 我尝试将locA $ DateTime保持原样,其中时间等于datA $ Time中的时间戳,然后尝试使用此函数将其与正确的datA $ Start_UTC相匹配。 将最小locA $ timestamp与datA $ timestamp匹配

findrow=function(dt,df) {min(which(df>dt))} rowA=sapply(locA$DateTime, findrow, df=datA$Start_UTC) resA=cbind(locA,datA[rowA,])

这给了我相同的结果。 我不知道发生了什么事。 我确定它是如何生成日期时间戳的固有的,但我不知道如何解决这个问题。 如果我将它们格式化为相同,并告诉R它们是UTC,那么它们是否应该以相同的方式处理?

请帮助,所有帮助,评论或幽默“挂在那里”的声明将非常感谢。 无法将我的纬度/长度与我的数据时间戳正确配对确实令人头疼。

干杯。

I have two data frames:

locA:

Date Time Temperature Voltage Longitude Latitude Deployment DateTime 2014-08-08 14:18:25 20.9 1.872 -126.6886 34.95633 A 2014-08-08 14:20:00 2014-08-08 14:38:25 19.8 1.862 -126.6899 34.95675 A 2014-08-08 14:40:00 2014-08-08 14:58:26 19.3 1.856 -126.6911 34.95732 A 2014-08-08 15:00:00 2014-08-08 15:18:26 18.8 1.872 -126.6921 34.95803 A 2014-08-08 15:20:00 2014-08-08 15:38:25 18.6 1.872 -126.6929 34.95872 A 2014-08-08 15:40:00 2014-08-08 15:58:26 18.4 1.859 -126.6937 34.95921 A 2014-08-08 16:00:00 2014-08-08 16:18:25 18.4 1.875 -126.6944 34.95983 A 2014-08-08 16:20:00 2014-08-08 16:38:25 18.4 1.875 -126.6949 34.96021 A 2014-08-08 16:40:00 2014-08-08 16:58:26 18.4 1.872 -126.6956 34.96033 A 2014-08-08 17:00:00 2014-08-08 17:18:26 18.4 1.872 -126.6962 34.96025 A 2014-08-08 17:20:00

and datA:

FileName Deployment Start_UTC Stop_UTC Duration sum PrAb dolPrAb DASBR1_20140808$140000.wav A 2014-08-08 14:00:00 2014-08-08 14:02:00 119.6000 0 0 0 DASBR1_20140808$142000.wav A 2014-08-08 14:20:00 2014-08-08 14:22:00 119.6000 0 0 0 DASBR1_20140808$144000.wav A 2014-08-08 14:40:00 2014-08-08 14:42:00 119.8000 0 0 0 DASBR1_20140808$150000.wav A 2014-08-08 15:00:00 2014-08-08 15:02:00 119.6000 0 0 0 DASBR1_20140808$152000.wav A 2014-08-08 15:20:00 2014-08-08 15:22:00 119.6000 0 0 0 DASBR1_20140808$154000.wav A 2014-08-08 15:40:00 2014-08-08 15:42:00 119.6000 0 0 0 DASBR1_20140808$160000.wav A 2014-08-08 16:00:00 2014-08-08 16:02:00 119.6000 1 1 1 DASBR1_20140808$162000.wav A 2014-08-08 16:20:00 2014-08-08 16:22:00 119.8000 0 0 0 DASBR1_20140808$164000.wav A 2014-08-08 16:40:00 2014-08-08 16:42:00 119.8000 0 0 0 DASBR1_20140808$170000.wav A 2014-08-08 17:00:00 2014-08-08 17:02:00 119.6000 0 0 0

Each are actually several thousand rows long. I want to merge them based on the "DateTime" stamp in locA and the "Start_UTC" in datA. Both of those columns have been formatted identically in UTC.

locA$DateTime=as.POSIXct(paste(locA$Date, time), format="%Y-%m-%d %H:%M:%S", tz="UTC") datA$Start_UTC = as.POSIXct(files$Start_UTC, format="%Y-%m-%d %H:%M:%S", tz="UTC")

In locA, the "time" variable comes from a sequence where I round the character string in locA$Time to the nearest 20min. "time" is a character string with the adjusted time.

str(time) chr [1:8845] "14:00:00" "14:20:00" "14:40:00" "15:00:00" "15:20:00" "15:40:00" "16:00:00" "16:20:00" "16:40:00" "17:00:00" ...

When I merge the two data frames using locA$DateTime and datA$Start_UTC, it never works properly. I get something that looks like this, every time.

resA = merge(datA,locA, by.x=c("Start_UTC"), by.y=c("DateTime"))

resA:

Start_UTC FileName Deployment.x Stop_UTC Duration sum PrAb dolPrAb Date Time Temperature Voltage Longitude Latitude Deployment.y 2014-08-08 14:00:00 DASBR1_20140808$140000.wav A 2014-08-08 14:02:00 119.6000 0 0 0 2014-08-08 20:58:26 19.1 1.872 -126.7018 34.94994 A 2014-08-08 14:20:00 DASBR1_20140808$142000.wav A 2014-08-08 14:22:00 119.6000 0 0 0 2014-08-08 21:18:26 19.1 1.872 -126.7027 34.94801 A 2014-08-08 14:40:00 DASBR1_20140808$144000.wav A 2014-08-08 14:42:00 119.8000 0 0 0 2014-08-08 21:38:26 19.1 1.872 -126.7038 34.94608 A 2014-08-08 15:00:00 DASBR1_20140808$150000.wav A 2014-08-08 15:02:00 119.6000 0 0 0 2014-08-08 21:58:27 18.9 1.865 -126.7050 34.94394 A 2014-08-08 15:20:00 DASBR1_20140808$152000.wav A 2014-08-08 15:22:00 119.6000 0 0 0 2014-08-08 22:18:27 18.9 1.856 -126.7062 34.94188 A 2014-08-08 15:40:00 DASBR1_20140808$154000.wav A 2014-08-08 15:42:00 119.6000 0 0 0 2014-08-08 22:38:26 18.8 1.862 -126.7077 34.93994 A 2014-08-08 16:00:00 DASBR1_20140808$160000.wav A 2014-08-08 16:02:00 119.6000 1 1 1 2014-08-08 22:58:26 18.8 1.859 -126.7090 34.93796 A 2014-08-08 16:20:00 DASBR1_20140808$162000.wav A 2014-08-08 16:22:00 119.8000 0 0 0 2014-08-08 23:18:26 18.8 1.865 -126.7105 34.93622 A 2014-08-08 16:40:00 DASBR1_20140808$164000.wav A 2014-08-08 16:42:00 119.8000 0 0 0 2014-08-08 23:38:27 18.6 1.856 -126.7120 34.93480 A 2014-08-08 17:00:00 DASBR1_20140808$170000.wav A 2014-08-08 17:02:00 119.6000 0 0 0 2014-08-09 23:58:28 18.8 1.849 -126.8051 34.88381 A

I have tried this so many different ways. I tried keeping the locA$DateTime as is, where the time equals the time stamp in datA$Time, and then tried using this function to match it to the correct datA$Start_UTC. Match the minimum locA$timestamp to the datA$timestamp

findrow=function(dt,df) {min(which(df>dt))} rowA=sapply(locA$DateTime, findrow, df=datA$Start_UTC) resA=cbind(locA,datA[rowA,])

This gave me the same results. I don't know what's going on. I'm sure it's intrinsic to how the datetime stamps are generated, but I don't know how to fix this. If I format them the same, and tell R that they are UTC, shouldn't they be treated the same way?

Help, please, all help, comments, or humorous "hang in there" statements would greatly appreciated. Not being able to properly pair my lat/longs with my data timestamps is really causing a headache.

Cheers.

最满意答案

我想知道在chr对象上进行合并而不是POSIXct对象会更好吗?

locA$DateTime_str = format(locA$DateTime) datA$Start_UTC_str = format(datA$Start_UTC) resA = merge(datA,locA, by.x=c("Start_UTC_str"), by.y=c("DateTime_str"))

I wonder if doing a merge on chr objects instead of POSIXct objects would work better?

locA$DateTime_str = format(locA$DateTime) datA$Start_UTC_str = format(datA$Start_UTC) resA = merge(datA,locA, by.x=c("Start_UTC_str"), by.y=c("DateTime_str"))

更多推荐

本文发布于:2023-04-28 03:46:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1329726.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:中未   正确   日期   时间   date

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!