我有两个数据框:
LOCA:
Date Time Temperature Voltage Longitude Latitude Deployment DateTime 2014-08-08 14:18:25 20.9 1.872 -126.6886 34.95633 A 2014-08-08 14:20:00 2014-08-08 14:38:25 19.8 1.862 -126.6899 34.95675 A 2014-08-08 14:40:00 2014-08-08 14:58:26 19.3 1.856 -126.6911 34.95732 A 2014-08-08 15:00:00 2014-08-08 15:18:26 18.8 1.872 -126.6921 34.95803 A 2014-08-08 15:20:00 2014-08-08 15:38:25 18.6 1.872 -126.6929 34.95872 A 2014-08-08 15:40:00 2014-08-08 15:58:26 18.4 1.859 -126.6937 34.95921 A 2014-08-08 16:00:00 2014-08-08 16:18:25 18.4 1.875 -126.6944 34.95983 A 2014-08-08 16:20:00 2014-08-08 16:38:25 18.4 1.875 -126.6949 34.96021 A 2014-08-08 16:40:00 2014-08-08 16:58:26 18.4 1.872 -126.6956 34.96033 A 2014-08-08 17:00:00 2014-08-08 17:18:26 18.4 1.872 -126.6962 34.96025 A 2014-08-08 17:20:00和datA:
FileName Deployment Start_UTC Stop_UTC Duration sum PrAb dolPrAb DASBR1_20140808$140000.wav A 2014-08-08 14:00:00 2014-08-08 14:02:00 119.6000 0 0 0 DASBR1_20140808$142000.wav A 2014-08-08 14:20:00 2014-08-08 14:22:00 119.6000 0 0 0 DASBR1_20140808$144000.wav A 2014-08-08 14:40:00 2014-08-08 14:42:00 119.8000 0 0 0 DASBR1_20140808$150000.wav A 2014-08-08 15:00:00 2014-08-08 15:02:00 119.6000 0 0 0 DASBR1_20140808$152000.wav A 2014-08-08 15:20:00 2014-08-08 15:22:00 119.6000 0 0 0 DASBR1_20140808$154000.wav A 2014-08-08 15:40:00 2014-08-08 15:42:00 119.6000 0 0 0 DASBR1_20140808$160000.wav A 2014-08-08 16:00:00 2014-08-08 16:02:00 119.6000 1 1 1 DASBR1_20140808$162000.wav A 2014-08-08 16:20:00 2014-08-08 16:22:00 119.8000 0 0 0 DASBR1_20140808$164000.wav A 2014-08-08 16:40:00 2014-08-08 16:42:00 119.8000 0 0 0 DASBR1_20140808$170000.wav A 2014-08-08 17:00:00 2014-08-08 17:02:00 119.6000 0 0 0每个实际上都有几千行。 我想根据locA中的“DateTime”标记和datA中的“Start_UTC”合并它们。 这两个列的格式都与UTC相同。
locA$DateTime=as.POSIXct(paste(locA$Date, time), format="%Y-%m-%d %H:%M:%S", tz="UTC") datA$Start_UTC = as.POSIXct(files$Start_UTC, format="%Y-%m-%d %H:%M:%S", tz="UTC")在locA中,“time”变量来自一个序列,其中我将locA $ Time中的字符串舍入到最接近的20min。 “time”是具有调整时间的字符串。
str(time) chr [1:8845] "14:00:00" "14:20:00" "14:40:00" "15:00:00" "15:20:00" "15:40:00" "16:00:00" "16:20:00" "16:40:00" "17:00:00" ...当我使用locA $ DateTime和datA $ Start_UTC合并两个数据帧时,它永远不会正常工作。 我每次都会得到这样的东西。
resA = merge(datA,locA, by.x=c("Start_UTC"), by.y=c("DateTime"))萨:
Start_UTC FileName Deployment.x Stop_UTC Duration sum PrAb dolPrAb Date Time Temperature Voltage Longitude Latitude Deployment.y 2014-08-08 14:00:00 DASBR1_20140808$140000.wav A 2014-08-08 14:02:00 119.6000 0 0 0 2014-08-08 20:58:26 19.1 1.872 -126.7018 34.94994 A 2014-08-08 14:20:00 DASBR1_20140808$142000.wav A 2014-08-08 14:22:00 119.6000 0 0 0 2014-08-08 21:18:26 19.1 1.872 -126.7027 34.94801 A 2014-08-08 14:40:00 DASBR1_20140808$144000.wav A 2014-08-08 14:42:00 119.8000 0 0 0 2014-08-08 21:38:26 19.1 1.872 -126.7038 34.94608 A 2014-08-08 15:00:00 DASBR1_20140808$150000.wav A 2014-08-08 15:02:00 119.6000 0 0 0 2014-08-08 21:58:27 18.9 1.865 -126.7050 34.94394 A 2014-08-08 15:20:00 DASBR1_20140808$152000.wav A 2014-08-08 15:22:00 119.6000 0 0 0 2014-08-08 22:18:27 18.9 1.856 -126.7062 34.94188 A 2014-08-08 15:40:00 DASBR1_20140808$154000.wav A 2014-08-08 15:42:00 119.6000 0 0 0 2014-08-08 22:38:26 18.8 1.862 -126.7077 34.93994 A 2014-08-08 16:00:00 DASBR1_20140808$160000.wav A 2014-08-08 16:02:00 119.6000 1 1 1 2014-08-08 22:58:26 18.8 1.859 -126.7090 34.93796 A 2014-08-08 16:20:00 DASBR1_20140808$162000.wav A 2014-08-08 16:22:00 119.8000 0 0 0 2014-08-08 23:18:26 18.8 1.865 -126.7105 34.93622 A 2014-08-08 16:40:00 DASBR1_20140808$164000.wav A 2014-08-08 16:42:00 119.8000 0 0 0 2014-08-08 23:38:27 18.6 1.856 -126.7120 34.93480 A 2014-08-08 17:00:00 DASBR1_20140808$170000.wav A 2014-08-08 17:02:00 119.6000 0 0 0 2014-08-09 23:58:28 18.8 1.849 -126.8051 34.88381 A我尝试过这么多不同的方法。 我尝试将locA $ DateTime保持原样,其中时间等于datA $ Time中的时间戳,然后尝试使用此函数将其与正确的datA $ Start_UTC相匹配。 将最小locA $ timestamp与datA $ timestamp匹配
findrow=function(dt,df) {min(which(df>dt))} rowA=sapply(locA$DateTime, findrow, df=datA$Start_UTC) resA=cbind(locA,datA[rowA,])这给了我相同的结果。 我不知道发生了什么事。 我确定它是如何生成日期时间戳的固有的,但我不知道如何解决这个问题。 如果我将它们格式化为相同,并告诉R它们是UTC,那么它们是否应该以相同的方式处理?
请帮助,所有帮助,评论或幽默“挂在那里”的声明将非常感谢。 无法将我的纬度/长度与我的数据时间戳正确配对确实令人头疼。
干杯。
I have two data frames:
locA:
Date Time Temperature Voltage Longitude Latitude Deployment DateTime 2014-08-08 14:18:25 20.9 1.872 -126.6886 34.95633 A 2014-08-08 14:20:00 2014-08-08 14:38:25 19.8 1.862 -126.6899 34.95675 A 2014-08-08 14:40:00 2014-08-08 14:58:26 19.3 1.856 -126.6911 34.95732 A 2014-08-08 15:00:00 2014-08-08 15:18:26 18.8 1.872 -126.6921 34.95803 A 2014-08-08 15:20:00 2014-08-08 15:38:25 18.6 1.872 -126.6929 34.95872 A 2014-08-08 15:40:00 2014-08-08 15:58:26 18.4 1.859 -126.6937 34.95921 A 2014-08-08 16:00:00 2014-08-08 16:18:25 18.4 1.875 -126.6944 34.95983 A 2014-08-08 16:20:00 2014-08-08 16:38:25 18.4 1.875 -126.6949 34.96021 A 2014-08-08 16:40:00 2014-08-08 16:58:26 18.4 1.872 -126.6956 34.96033 A 2014-08-08 17:00:00 2014-08-08 17:18:26 18.4 1.872 -126.6962 34.96025 A 2014-08-08 17:20:00and datA:
FileName Deployment Start_UTC Stop_UTC Duration sum PrAb dolPrAb DASBR1_20140808$140000.wav A 2014-08-08 14:00:00 2014-08-08 14:02:00 119.6000 0 0 0 DASBR1_20140808$142000.wav A 2014-08-08 14:20:00 2014-08-08 14:22:00 119.6000 0 0 0 DASBR1_20140808$144000.wav A 2014-08-08 14:40:00 2014-08-08 14:42:00 119.8000 0 0 0 DASBR1_20140808$150000.wav A 2014-08-08 15:00:00 2014-08-08 15:02:00 119.6000 0 0 0 DASBR1_20140808$152000.wav A 2014-08-08 15:20:00 2014-08-08 15:22:00 119.6000 0 0 0 DASBR1_20140808$154000.wav A 2014-08-08 15:40:00 2014-08-08 15:42:00 119.6000 0 0 0 DASBR1_20140808$160000.wav A 2014-08-08 16:00:00 2014-08-08 16:02:00 119.6000 1 1 1 DASBR1_20140808$162000.wav A 2014-08-08 16:20:00 2014-08-08 16:22:00 119.8000 0 0 0 DASBR1_20140808$164000.wav A 2014-08-08 16:40:00 2014-08-08 16:42:00 119.8000 0 0 0 DASBR1_20140808$170000.wav A 2014-08-08 17:00:00 2014-08-08 17:02:00 119.6000 0 0 0Each are actually several thousand rows long. I want to merge them based on the "DateTime" stamp in locA and the "Start_UTC" in datA. Both of those columns have been formatted identically in UTC.
locA$DateTime=as.POSIXct(paste(locA$Date, time), format="%Y-%m-%d %H:%M:%S", tz="UTC") datA$Start_UTC = as.POSIXct(files$Start_UTC, format="%Y-%m-%d %H:%M:%S", tz="UTC")In locA, the "time" variable comes from a sequence where I round the character string in locA$Time to the nearest 20min. "time" is a character string with the adjusted time.
str(time) chr [1:8845] "14:00:00" "14:20:00" "14:40:00" "15:00:00" "15:20:00" "15:40:00" "16:00:00" "16:20:00" "16:40:00" "17:00:00" ...When I merge the two data frames using locA$DateTime and datA$Start_UTC, it never works properly. I get something that looks like this, every time.
resA = merge(datA,locA, by.x=c("Start_UTC"), by.y=c("DateTime"))resA:
Start_UTC FileName Deployment.x Stop_UTC Duration sum PrAb dolPrAb Date Time Temperature Voltage Longitude Latitude Deployment.y 2014-08-08 14:00:00 DASBR1_20140808$140000.wav A 2014-08-08 14:02:00 119.6000 0 0 0 2014-08-08 20:58:26 19.1 1.872 -126.7018 34.94994 A 2014-08-08 14:20:00 DASBR1_20140808$142000.wav A 2014-08-08 14:22:00 119.6000 0 0 0 2014-08-08 21:18:26 19.1 1.872 -126.7027 34.94801 A 2014-08-08 14:40:00 DASBR1_20140808$144000.wav A 2014-08-08 14:42:00 119.8000 0 0 0 2014-08-08 21:38:26 19.1 1.872 -126.7038 34.94608 A 2014-08-08 15:00:00 DASBR1_20140808$150000.wav A 2014-08-08 15:02:00 119.6000 0 0 0 2014-08-08 21:58:27 18.9 1.865 -126.7050 34.94394 A 2014-08-08 15:20:00 DASBR1_20140808$152000.wav A 2014-08-08 15:22:00 119.6000 0 0 0 2014-08-08 22:18:27 18.9 1.856 -126.7062 34.94188 A 2014-08-08 15:40:00 DASBR1_20140808$154000.wav A 2014-08-08 15:42:00 119.6000 0 0 0 2014-08-08 22:38:26 18.8 1.862 -126.7077 34.93994 A 2014-08-08 16:00:00 DASBR1_20140808$160000.wav A 2014-08-08 16:02:00 119.6000 1 1 1 2014-08-08 22:58:26 18.8 1.859 -126.7090 34.93796 A 2014-08-08 16:20:00 DASBR1_20140808$162000.wav A 2014-08-08 16:22:00 119.8000 0 0 0 2014-08-08 23:18:26 18.8 1.865 -126.7105 34.93622 A 2014-08-08 16:40:00 DASBR1_20140808$164000.wav A 2014-08-08 16:42:00 119.8000 0 0 0 2014-08-08 23:38:27 18.6 1.856 -126.7120 34.93480 A 2014-08-08 17:00:00 DASBR1_20140808$170000.wav A 2014-08-08 17:02:00 119.6000 0 0 0 2014-08-09 23:58:28 18.8 1.849 -126.8051 34.88381 AI have tried this so many different ways. I tried keeping the locA$DateTime as is, where the time equals the time stamp in datA$Time, and then tried using this function to match it to the correct datA$Start_UTC. Match the minimum locA$timestamp to the datA$timestamp
findrow=function(dt,df) {min(which(df>dt))} rowA=sapply(locA$DateTime, findrow, df=datA$Start_UTC) resA=cbind(locA,datA[rowA,])This gave me the same results. I don't know what's going on. I'm sure it's intrinsic to how the datetime stamps are generated, but I don't know how to fix this. If I format them the same, and tell R that they are UTC, shouldn't they be treated the same way?
Help, please, all help, comments, or humorous "hang in there" statements would greatly appreciated. Not being able to properly pair my lat/longs with my data timestamps is really causing a headache.
Cheers.
最满意答案
我想知道在chr对象上进行合并而不是POSIXct对象会更好吗?
locA$DateTime_str = format(locA$DateTime) datA$Start_UTC_str = format(datA$Start_UTC) resA = merge(datA,locA, by.x=c("Start_UTC_str"), by.y=c("DateTime_str"))I wonder if doing a merge on chr objects instead of POSIXct objects would work better?
locA$DateTime_str = format(locA$DateTime) datA$Start_UTC_str = format(datA$Start_UTC) resA = merge(datA,locA, by.x=c("Start_UTC_str"), by.y=c("DateTime_str"))更多推荐
发布评论