dplyr中的滑动组[重复](Sliding groups in dplyr [duplicate])

编程入门行业动态更新时间:2024-10-26 02:37:31

这个问题在这里已有答案：

查找所有重复行，包括“具有较小下标的元素” 3个答案

我有一个数据集，其中包含每个日期的许多唯一标识符，例如

df <- data.frame(date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-02")), ids = c(3, 4, 1, 3))

然后，我想总结一下这些信息，以获取当前日期出现的新唯一ID的数量。例如，1月1日有两个独特的ID（3和4）。但是在1月2日，只有一个新的唯一ID（1）。因此，结果数据框应如下所示：

date n_new_unique_ids 2016-01-01 2 2016-01-02 1

这对dplyr有可能吗？我看一下lag但固定的滞后大小在这种情况下没有意义。或者可能还有其他套餐？

This question already has an answer here:

Finding ALL duplicate rows, including “elements with smaller subscripts” 5 answers

I have a data set which contains a number of unique identifiers for each date, e.g.

df <- data.frame(date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-02")), ids = c(3, 4, 1, 3))

I'd then like to summarise this information to get the number of new unique ids that appear on the current date. For example, on January 1 there are two uniques ids (3 and 4). But on January 2, there is only one new unique id (1). So the resulting data frame should look like:

date n_new_unique_ids 2016-01-01 2 2016-01-02 1

Is this possible with dplyr? I had a look at lag but a fixed lag size doesn't make sense in this context. Or perhaps with another package?

最满意答案

一种选择是从数据集中删除所有duplicated “ID”

df %>% filter(!(duplicated(ids)|duplicated(ids, fromLast=TRUE))) # date ids #1 2016-01-01 2 #2 2016-01-02 3

更新

使用更新的数据

df %>% arrange(date, ids) %>% filter(!duplicated(ids)) %>% group_by(date) %>% summarise(n_unique_ids = n()) # date n_unique_ids # <date> <int> #1 2016-01-01 2 #2 2016-01-02 1

One option would be to remove all the duplicated 'ids' from the dataset

df %>% filter(!(duplicated(ids)|duplicated(ids, fromLast=TRUE))) # date ids #1 2016-01-01 2 #2 2016-01-02 3

Update

Using the updated data

df %>% arrange(date, ids) %>% filter(!duplicated(ids)) %>% group_by(date) %>% summarise(n_unique_ids = n()) # date n_unique_ids # <date> <int> #1 2016-01-01 2 #2 2016-01-02 1

更多推荐

本文发布于:2023-07-06 07:54:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1047555.html