data.table:第一次出现的行(data.table: field in row with first ocurrence)

编程入门 行业动态 更新时间:2024-10-23 05:36:42
data.table:第一次出现的行(data.table: field in row with first ocurrence)

我想探索一个data.table分组的优雅(单行)解决方案

我有一个data.table如下:

library(data.table) library(lubridate) dt.master <- data.table(user = c(1000, 1002, 2008, 3005, 1000, 1002, 1002), target = c(50000, 50004, 50501, 50001, 50000, 50000, 50004), channel = c("A", "B", "C", "A", "B", "A", "C"), date = c(dmy("10/02/2018"), dmy("11/04/2018"), dmy("14/03/2018"), dmy("02/03/2018"), dmy("05/01/2018"), dmy("08/05/2018"), dmy("05/03/2018")))

那是:

user target channel date 1: 1000 50000 A 2018-02-10 2: 1002 50004 B 2018-04-11 3: 2008 50501 C 2018-03-14 4: 3005 50001 A 2018-03-02 5: 1000 50000 B 2018-01-05 6: 1002 50000 A 2018-05-08 7: 1002 50004 C 2018-03-05

我想知道,对于每个(user, target)组的第一次出现的频道,并将其添加到dt.master。 这是:

user target channel date first_channel 1: 1000 50000 A 2018-02-10 B 2: 1000 50000 B 2018-01-05 B 3: 1002 50000 A 2018-05-08 A 4: 1002 50004 B 2018-04-11 C 5: 1002 50004 C 2018-03-05 C 6: 2008 50501 C 2018-03-14 C 7: 3005 50001 A 2018-03-02 A

目前,我正在分两步进行:

首先,我提取第一次发生的行

dt.result <- dt.master[dt.master[, .(first_interest = .I[which.min(date)]), by = c("user", "target")]$first_interest,]

之后,我将它与dt.master合并

setnames(dt.result, "channel", "first_channel") dt.master <- merge(dt.master, dt.result[, .(user, target, first_channel)], by.x = c("user", "target"), by.y = c("user", "target"), all.x = T, all.y = F)

有没有办法做到没有合并? 我相信必须有修改第一行的解决方案,但我找不到它。

非常感谢!

I would like to explore an elegant (one-liner) solution for a data.table grouping

I have a data.table as follows:

library(data.table) library(lubridate) dt.master <- data.table(user = c(1000, 1002, 2008, 3005, 1000, 1002, 1002), target = c(50000, 50004, 50501, 50001, 50000, 50000, 50004), channel = c("A", "B", "C", "A", "B", "A", "C"), date = c(dmy("10/02/2018"), dmy("11/04/2018"), dmy("14/03/2018"), dmy("02/03/2018"), dmy("05/01/2018"), dmy("08/05/2018"), dmy("05/03/2018")))

That is:

user target channel date 1: 1000 50000 A 2018-02-10 2: 1002 50004 B 2018-04-11 3: 2008 50501 C 2018-03-14 4: 3005 50001 A 2018-03-02 5: 1000 50000 B 2018-01-05 6: 1002 50000 A 2018-05-08 7: 1002 50004 C 2018-03-05

I would like to know, for each group of (user, target), the channel of the first ocurrence, and add it to the dt.master. This is:

user target channel date first_channel 1: 1000 50000 A 2018-02-10 B 2: 1000 50000 B 2018-01-05 B 3: 1002 50000 A 2018-05-08 A 4: 1002 50004 B 2018-04-11 C 5: 1002 50004 C 2018-03-05 C 6: 2008 50501 C 2018-03-14 C 7: 3005 50001 A 2018-03-02 A

Currently, I am doing it in two steps:

First, I extract the rows of the first ocurrence

dt.result <- dt.master[dt.master[, .(first_interest = .I[which.min(date)]), by = c("user", "target")]$first_interest,]

And, afterwards, I merge it with dt.master

setnames(dt.result, "channel", "first_channel") dt.master <- merge(dt.master, dt.result[, .(user, target, first_channel)], by.x = c("user", "target"), by.y = c("user", "target"), all.x = T, all.y = F)

Is there a way to do it with no merges? I believe there must be a solution modifying the first line, but I cannot find it.

Thanks a lot!

最满意答案

您可以按照以下参考组进行更新:

dt.master[, first_channel := channel[which.min(date)], keyby=.(user, target)]

You can update by reference by groups as follows:

dt.master[, first_channel := channel[which.min(date)], keyby=.(user, target)]

更多推荐

本文发布于:2023-07-30 22:12:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1340341.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:table   data   field   ocurrence   row

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!