删除没有特定时间跨度的行(Delete rows which are without a specific time span)

编程入门 行业动态 更新时间:2024-10-27 12:25:04
删除没有特定时间跨度的行(Delete rows which are without a specific time span)

我有一个包含40列的数据集,每行有100.000行,我需要过滤/缩小/稀释:所以我想删除在2014年10月1日之前和2016年8月20日之后发出的所有订单(我希望保留在表中的时间范围是1.10.2104-20.8.2016)我该怎么做(只是从表中删除不需要的旧数据)这是一个例子:

DB <- data.frame(orderID = c(1,2,3,4,5,6,7,8,9,10), orderDate = c("01.07.2014 05:11","12.08.2014 12:39","09.09.2015 09:14","04.10.2014 16:15","02.11.2015 07:04", "10.11.2015 16:52","20.02.2016 08:08","12.04.2016 14:07","24.07.2016 17:04","09.09.2016 06:04"), itemID = c(2,3,2,5,12,4,2,3,1,5), size = c("m", "l", 42, "xxl", "m", 42, 39, "m", "m", 44), color = c("green", "red", "blue", "yellow", "red", "yellow", "blue", "red", "green", "black"), manufacturer = c("11", "12", "13", "12", "13", "13", "12", "11", "11", "13") customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1)

预期结果:

DB <- data.frame(orderID = c(3,4,5,6,7,8,9), orderDate = c("09.09.2015 09:14","04.10.2014 16:15","02.11.2015 07:04", "10.11.2015 16:52","20.02.2016 08:08","12.04.2016 14:07","24.07.2016 17:04"), itemID = c(2,5,12,4,2,3,1), size = c(42, "xxl", "m", 42, 39, "m", "m"), color = c("blue", "yellow", "red", "yellow", "blue", "red", "green"), manufacturer = c("13", "12", "13", "13", "12", "11", "11") customerID = c(3, 1, 1, 3, 2, 2, 1)

I have a dataset with 40 columns with 100.000 rows each which I need to filter/reduce/thin out: So I want to remove all orders made before 1.October 2014 and after 20.8.2016 (time span I want to keep in table is 1.10.2104-20.8.2016) How can I do this (and just delete the unneeded older data out the table) Here´s an example:

DB <- data.frame(orderID = c(1,2,3,4,5,6,7,8,9,10), orderDate = c("01.07.2014 05:11","12.08.2014 12:39","09.09.2015 09:14","04.10.2014 16:15","02.11.2015 07:04", "10.11.2015 16:52","20.02.2016 08:08","12.04.2016 14:07","24.07.2016 17:04","09.09.2016 06:04"), itemID = c(2,3,2,5,12,4,2,3,1,5), size = c("m", "l", 42, "xxl", "m", 42, 39, "m", "m", 44), color = c("green", "red", "blue", "yellow", "red", "yellow", "blue", "red", "green", "black"), manufacturer = c("11", "12", "13", "12", "13", "13", "12", "11", "11", "13") customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1)

Expected Outcome:

DB <- data.frame(orderID = c(3,4,5,6,7,8,9), orderDate = c("09.09.2015 09:14","04.10.2014 16:15","02.11.2015 07:04", "10.11.2015 16:52","20.02.2016 08:08","12.04.2016 14:07","24.07.2016 17:04"), itemID = c(2,5,12,4,2,3,1), size = c(42, "xxl", "m", 42, 39, "m", "m"), color = c("blue", "yellow", "red", "yellow", "blue", "red", "green"), manufacturer = c("13", "12", "13", "13", "12", "11", "11") customerID = c(3, 1, 1, 3, 2, 2, 1)

最满意答案

在定义数据的示例代码中缺少逗号和右括号。

修复之后,数据定义如下所示(由dput生成):

structure(list(orderID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), orderDate = structure(c(1L, 8L, 4L, 3L, 2L, 6L, 9L, 7L, 10L, 5L), .Label = c("01.07.2014 05:11", "02.11.2015 07:04", "04.10.2014 16:15", "09.09.2015 09:14", "09.09.2016 06:04", "10.11.2015 16:52", "12.04.2016 14:07", "12.08.2014 12:39", "20.02.2016 08:08", "24.07.2016 17:04"), class = "factor"), itemID = c(2, 3, 2, 5, 12, 4, 2, 3, 1, 5), size = structure(c(5L, 4L, 2L, 6L, 5L, 2L, 1L, 5L, 5L, 3L), .Label = c("39", "42", "44", "l", "m", "xxl" ), class = "factor"), color = structure(c(3L, 4L, 2L, 5L, 4L, 5L, 2L, 4L, 3L, 1L), .Label = c("black", "blue", "green", "red", "yellow"), class = "factor"), manufacturer = structure(c(1L, 2L, 3L, 2L, 3L, 3L, 2L, 1L, 1L, 3L), .Label = c("11", "12", "13" ), class = "factor"), customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1)), .Names = c("orderID", "orderDate", "itemID", "size", "color", "manufacturer", "customerID"), row.names = c(NA, -10L ), class = "data.frame")

然后可能的解决方案是

custom_format = "%d.%m.%Y" date <- as.Date(substr(DB$orderDate, 1, 11), format = custom_format) subset(DB, date > "2014-10-01" & date < "2016-08-20")

There is a comma and a closing parenthesis missing in your example code defining the data.

After fixing that, the data definition looks like this (generated by dput):

structure(list(orderID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), orderDate = structure(c(1L, 8L, 4L, 3L, 2L, 6L, 9L, 7L, 10L, 5L), .Label = c("01.07.2014 05:11", "02.11.2015 07:04", "04.10.2014 16:15", "09.09.2015 09:14", "09.09.2016 06:04", "10.11.2015 16:52", "12.04.2016 14:07", "12.08.2014 12:39", "20.02.2016 08:08", "24.07.2016 17:04"), class = "factor"), itemID = c(2, 3, 2, 5, 12, 4, 2, 3, 1, 5), size = structure(c(5L, 4L, 2L, 6L, 5L, 2L, 1L, 5L, 5L, 3L), .Label = c("39", "42", "44", "l", "m", "xxl" ), class = "factor"), color = structure(c(3L, 4L, 2L, 5L, 4L, 5L, 2L, 4L, 3L, 1L), .Label = c("black", "blue", "green", "red", "yellow"), class = "factor"), manufacturer = structure(c(1L, 2L, 3L, 2L, 3L, 3L, 2L, 1L, 1L, 3L), .Label = c("11", "12", "13" ), class = "factor"), customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1)), .Names = c("orderID", "orderDate", "itemID", "size", "color", "manufacturer", "customerID"), row.names = c(NA, -10L ), class = "data.frame")

Then a possible solution is

custom_format = "%d.%m.%Y" date <- as.Date(substr(DB$orderDate, 1, 11), format = custom_format) subset(DB, date > "2014-10-01" & date < "2016-08-20")

更多推荐

本文发布于:2023-07-28 18:40:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1307954.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:跨度   时间   Delete   rows   time

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!