dplyr中的滑动组[重复](Sliding groups in dplyr [duplicate])

编程入门 行业动态 更新时间:2024-10-26 02:37:31
dplyr中的滑动组[重复](Sliding groups in dplyr [duplicate])

这个问题在这里已有答案:

查找所有重复行,包括“具有较小下标的元素” 3个答案

我有一个数据集,其中包含每个日期的许多唯一标识符,例如

df <- data.frame(date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-02")), ids = c(3, 4, 1, 3))

然后,我想总结一下这些信息,以获取当前日期出现的新唯一ID的数量。 例如,1月1日有两个独特的ID(3和4)。 但是在1月2日,只有一个新的唯一ID(1)。 因此,结果数据框应如下所示:

date n_new_unique_ids 2016-01-01 2 2016-01-02 1

这对dplyr有可能吗? 我看一下lag但固定的滞后大小在这种情况下没有意义。 或者可能还有其他套餐?

This question already has an answer here:

Finding ALL duplicate rows, including “elements with smaller subscripts” 5 answers

I have a data set which contains a number of unique identifiers for each date, e.g.

df <- data.frame(date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-02")), ids = c(3, 4, 1, 3))

I'd then like to summarise this information to get the number of new unique ids that appear on the current date. For example, on January 1 there are two uniques ids (3 and 4). But on January 2, there is only one new unique id (1). So the resulting data frame should look like:

date n_new_unique_ids 2016-01-01 2 2016-01-02 1

Is this possible with dplyr? I had a look at lag but a fixed lag size doesn't make sense in this context. Or perhaps with another package?

最满意答案

一种选择是从数据集中删除所有duplicated “ID”

df %>% filter(!(duplicated(ids)|duplicated(ids, fromLast=TRUE))) # date ids #1 2016-01-01 2 #2 2016-01-02 3

更新

使用更新的数据

df %>% arrange(date, ids) %>% filter(!duplicated(ids)) %>% group_by(date) %>% summarise(n_unique_ids = n()) # date n_unique_ids # <date> <int> #1 2016-01-01 2 #2 2016-01-02 1

One option would be to remove all the duplicated 'ids' from the dataset

df %>% filter(!(duplicated(ids)|duplicated(ids, fromLast=TRUE))) # date ids #1 2016-01-01 2 #2 2016-01-02 3

Update

Using the updated data

df %>% arrange(date, ids) %>% filter(!duplicated(ids)) %>% group_by(date) %>% summarise(n_unique_ids = n()) # date n_unique_ids # <date> <int> #1 2016-01-01 2 #2 2016-01-02 1

更多推荐

本文发布于:2023-07-06 07:54:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1047555.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:Sliding   dplyr   duplicate   groups

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!