问题描述
限时送ChatGPT账号..我正在处理从 Teradata 导出的 .csv 数据.几列最初是带时区的时间戳,因此在 R 中加载 .csv 后,我想将这些列(作为字符串加载)转换为 POSIXlt 或 POSIXct.我正在使用 strptime
,但 .csv 文件中的时区格式与 strptime
期望的不匹配.例如,它需要 -0400
,但 .csv 的格式为 -04:00
,其中冒号分隔小时和分钟.
I am working with .csv data that was exported from Teradata. Several columns were originally timestamps with timezones, so after loading the .csv in R I'd like to convert these columns (which are loaded as strings) to POSIXlt or POSIXct. I am using strptime
, but the format of the timezone from the .csv file does not match what strptime
is expecting. For example, it expects -0400
but the .csv has the format -04:00
where a colon separates the hours and minutes.
我可以去掉冒号,但这是一个额外的步骤和并发症,如果可能的话我想避免.有没有办法告诉 strptime
使用不同的时区格式 (%z
)?
I can remove the colon, but this is an extra step and complication I'd like to avoid if possible. Is there a way to tell strptime
to use a different format for the timezone (%z
)?
这是一个例子:
## Example data:
x <- c("2011-10-12 22:17:13.860746-04:00", "2011-10-12 22:17:13.860746+00:00")
format <- "%Y-%m-%d %H:%M:%OS%z"
## Doesn't work:
strptime(x,format)
## [1] NA NA
## Ignores the timezone:
as.POSIXct(x)
## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 22:17:13 EDT"
## Remove the last colon:
x2 <- gsub("(.*):", "\\1", x)
x2
## [1] "2011-10-12 22:17:13.860746-0400" "2011-10-12 22:17:13.860746+0000"
## This works, but requires extra processing (removing the colon)
strptime(x2,format)
## [1] "2011-10-12 22:17:13" "2011-10-12 18:17:13"
所以我希望使用诸如 strptime(x,"%Y-%m-%d %H:%M:%OS%zz")
之类的方法来实现最后一个结果,其中%zz
是用于识别 -04:00
格式的时区的自定义表达式.或者 %zH:%zM
可能会更好.
So I'm looking to achieve this last result using something like strptime(x,"%Y-%m-%d %H:%M:%OS%zz")
, where %zz
is a custom expression for the timezone that recognizes the -04:00
format. Or %zH:%zM
might be even better.
如果这是不可能的,是否有人有将字符串(各种格式)转换为 data.frame/data.table 多列的日期的巧妙/灵活的功能?
If this isn't possible, does anyone have a slick/flexible function for converting strings (of various formats) to dates for multiple columns of a data.frame/data.table?
推荐答案
事实证明 lubridate
可以处理这种格式:
It turns out lubridate
can handle this format:
library(lubridate)
ymd_hms(x)
## [1] "2011-10-13 02:17:13 UTC" "2011-10-12 22:17:13 UTC"
或者,以本地时区显示:
Or, to display in the local timezone:
with_tz(ymd_hms(x))
## [1] "2011-10-12 22:17:13 EDT" "2011-10-12 18:17:13 EDT"
为了获得更大的灵活性(仍然使用 lubridate
):
For more flexibility (still using lubridate
):
parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z")
为了更快的速度(在 lubridate
选项中):
For faster speed (amongst lubridate
options):
lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO")
时间:
microbenchmark(
ymd_hms(x),
parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z"),
lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO"),
strptime(gsub("(.*):", "\\1", x), format)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## ymd_hms(x) 1523.819 1578.495 1715.14577 1629.5385 1744.3695 2850.393 100
## parse_date_time(x, "%Y-%m-%d %H:%M:%OS%z") 1108.676 1150.633 1273.77301 1190.3315 1264.8050 5947.204 100
## lubridate:::.strptime(x, "%Y-%m-%d %H:%M:%OS%OO") 89.838 103.390 112.45338 107.8425 115.2265 216.512 100
## strptime(gsub("(.*):", "\\\\1", x), format) 46.716 58.294 71.90934 69.9415 86.5860 105.044 100
这篇关于使用带有特殊时区格式的 strptime %z的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论