我的数据框示例:
date 1 25 February 1987 2 20 August 1974 3 9 October 1984 4 18 August 1992 5 19 September 1995 6 16-Oct-63 7 30-Sep-65 8 22 Jan 2008 9 13-11-1961 10 18 August 1987 11 15-Sep-70 12 5 October 1994 13 5 December 1984 14 03/23/87 15 30 August 1988 16 26-10-1993 17 22 August 1989 18 13-Sep-97我有一个大型数据框,日期变量具有多种日期格式。变量中的大多数格式都显示在上面,还有一些非常罕见的其他格式。有多种格式的原因是数据是从各种使用不同格式的网站拉到一起的。
I have a large dataframe with a date variable that has multiple formats for dates. Most of the formats in the variable are shown above- there are a couple of very rare others too. The reason why there are multiple formats is that the data were pulled together from various websites that each used different formats.
我已经尝试使用简单的转换,例如
I have tried using straightforward conversions e.g.
strftime(mydf$date,"%d/%m/%Y")但如果有多种格式,这些转换将不起作用。我不想诉诸多个gsub类型的编辑。我想知道我是否缺少一个更简单的解决方案?
but these sorts of conversion will not work if there are multiple formats. I don't want to resort to multiple gsub type editing. I was wondering if I am missing a more simple solution?
代码例如:
structure(list(date = structure(c(12L, 8L, 18L, 6L, 7L, 4L, 14L, 10L, 1L, 5L, 3L, 17L, 16L, 11L, 15L, 13L, 9L, 2L), .Label = c("13-11-1961", "13-Sep-97", "15-Sep-70", "16-Oct-63", "18 August 1987", "18 August 1992", "19 September 1995", "20 August 1974", "22 August 1989", "22 Jan 2008", "03/23/87", "25 February 1987", "26-10-1993", "30-Sep-65", "30 August 1988", "5 December 1984", "5 October 1994", "9 October 1984"), class = "factor")), .Names = "date", row.names = c(NA, -18L), class = "data.frame")推荐答案
您可以尝试 parse_date_time $ c> lubridate 其中允许用户使用订单参数指定几种格式订单来处理异构日期时间字符表示。像...一样...
You may try parse_date_time in package lubridate which "allows the user to specify several format-orders to handle heterogeneous date-time character representations" using the orders argument. Something like...
library(lubridate) parse_date_time(x = df$date, orders = c("d m y", "d B Y", "m/d/y"), locale = "eng")...应该能够处理你的大部分格式。请注意, b / B 格式是 locale 敏感。
...should be able to handle most of your formats. Please note that b/B formats are locale sensitive.
更多推荐
将混合日期格式的变量转换为r中的一种格式
发布评论