替换筛选数据框的第二个条目(Replace the second entry of a filtered data frame)

编程入门 行业动态 更新时间:2024-10-23 19:28:06
替换筛选数据框的第二个条目(Replace the second entry of a filtered data frame)

我有一个大型数据帧,行数超过18米,格式如下:

house_id date_time value 1000 2010-10-31 00:30:00 0.6 1000 2010-10-31 00:30:00 0.4 1000 2010-10-31 01:00:00 0.5 1001 2010-10-31 00:30:00 0.5 1001 2010-10-31 00:30:00 0.7 1001 2010-10-31 01:00:00 0.9

我想用date_time = 2010-10-31 00:30:00为每个不同的house_id替换包含date_time = 2010-10-31 00:30:00的第二行,但保留2010-10-31 00:30:00的第一个实例2010-10-31 00:30:00一样。

谢谢!

I have a large dataframe with over 18m rows in the following format:

house_id date_time value 1000 2010-10-31 00:30:00 0.6 1000 2010-10-31 00:30:00 0.4 1000 2010-10-31 01:00:00 0.5 1001 2010-10-31 00:30:00 0.5 1001 2010-10-31 00:30:00 0.7 1001 2010-10-31 01:00:00 0.9

I would like to replace the second row containing date_time = 2010-10-31 00:30:00 for each of the different house_id with 2010-10-31 01:00:00, but keep the first instance of 2010-10-31 00:30:00 the same.

Thank you!

最满意答案

OP已请求替换每个house_id 的特定日期时间值的第二(最后)出现。

根据OP,has数据集具有超过18 M行,这使得值得考虑更新到位 ,即不复制完整数据对象。

仅更新选定的行

library(data.table) setDT(DF) # coerce to data.table in place address(DF) DF[DF[date_time == as.POSIXct("2010-10-31 00:30:00"), last(.I), by = house_id]$V1, date_time := as.POSIXct("2010-10-31 01:00:00")][] address(DF)

要更新的行由标识

DF[date_time == as.POSIXct("2010-10-31 00:30:00"), last(.I), by = house_id]
house_id V1 1: 1000 2 2: 1001 5

更新操作之前和之后的address(DF)调用address(DF)是验证DF是否已被修改而不进行复制。

加入期间更新

作为更新所选行的替代方法,可以使用连接期间的更新:

library(data.table) setDT(DF) address(DF) DF[CJ(unique(house_id), as.POSIXct("2010-10-31 00:30:00")), on = .(house_id = V1, date_time = V2), mult = "last", date_time := as.POSIXct("2010-10-31 01:00:00")][] address(DF)

返回相同的结果:

house_id date_time value 1: 1000 2010-10-31 00:30:00 0.6 2: 1000 2010-10-31 01:00:00 0.4 3: 1000 2010-10-31 01:00:00 0.5 4: 1001 2010-10-31 00:30:00 0.5 5: 1001 2010-10-31 01:00:00 0.7 6: 1001 2010-10-31 01:00:00 0.9

在这里, CJ()创建一个查找表,其中包含所有唯一的house_id和要替换的日期时间。

警告

该问题的措辞表明,每个house_id有2行,日期时间为as.POSIXct("2010-10-31 00:30:00") 。

这可以通过以下方式进行验证

DF[date_time == as.POSIXct("2010-10-31 00:30:00"), .N, by = house_id][N != 2]

应该返回一个空的data.table。

The OP has requested to replace the second (last) occurrence of a particular date time value for each house_id.

According to the OP, the has data set has over 18 M rows which makes it worthwhile to consider an update in place, i.e., without copying the complete data object.

Update only selected rows

library(data.table) setDT(DF) # coerce to data.table in place address(DF) DF[DF[date_time == as.POSIXct("2010-10-31 00:30:00"), last(.I), by = house_id]$V1, date_time := as.POSIXct("2010-10-31 01:00:00")][] address(DF)

The rows to be updated are identified by

DF[date_time == as.POSIXct("2010-10-31 00:30:00"), last(.I), by = house_id]
house_id V1 1: 1000 2 2: 1001 5

The calls to address(DF) before and after the update operation is to verify that DF has been modified without copying.

Update during join

As an alternative to updating selected rows, an update during join can be used:

library(data.table) setDT(DF) address(DF) DF[CJ(unique(house_id), as.POSIXct("2010-10-31 00:30:00")), on = .(house_id = V1, date_time = V2), mult = "last", date_time := as.POSIXct("2010-10-31 01:00:00")][] address(DF)

which returns the same result:

house_id date_time value 1: 1000 2010-10-31 00:30:00 0.6 2: 1000 2010-10-31 01:00:00 0.4 3: 1000 2010-10-31 01:00:00 0.5 4: 1001 2010-10-31 00:30:00 0.5 5: 1001 2010-10-31 01:00:00 0.7 6: 1001 2010-10-31 01:00:00 0.9

Here, CJ() creates a lookup table consisting of all unique house_ids and the date time to replace.

Caveat

The wording of the question suggests that there are always 2 rows for each house_id with date time as.POSIXct("2010-10-31 00:30:00").

This can be verfied by

DF[date_time == as.POSIXct("2010-10-31 00:30:00"), .N, by = house_id][N != 2]

which should return an empty data.table.

更多推荐

本文发布于:2023-07-31 00:42:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1340469.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:第二个   条目   数据   Replace   frame

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!