如果满足多个条件,则具有重置选项的累积和

编程入门 行业动态 更新时间:2024-10-10 13:13:52
本文介绍了如果满足多个条件,则具有重置选项的累积和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

如果满足多个条件,我将尝试使用reset选项进行累加。更具体地说,我想对按 id分组的变量金额和 count 进行累加和并在满足以下两个条件的情况下再次从0重新开始:金额> = 10和 count > =3。如果满足这些条件,我还想创建一个包含1的新列,否则满足0。

I am trying to make a cumulative sum with reset option if multiple conditions are met. More specifically, I want to cumulative sum the variables amount and count grouped by id and reset/start from 0 again if these two conditions are met: amount >= 10 and count >= 3. I also would like to create a new column that contains 1 if these conditions are met and 0 otherwise.

数据样本:

df <- data.frame( date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01")), id = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), amount = c(1, 9, 5, 5, 6, 2, 10, 4, 8, 10, 6, 5, 5, 1, 6, 5, 5, 5), count = c(0, 2, 5, 4, 5, 1, 0, 0, 0, 0, 2, 1, 1, 1, 1, 2, 1, 0) )

所需的输出:

df <- data.frame( date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01")), id = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), amount = c(1, 9, 5, 5, 6, 2, 10, 4, 8, 10, 6, 5, 5, 1, 6, 5, 5, 5), count = c(0, 2, 5, 4, 5, 1, 0, 0, 0, 0, 2, 1, 1, 1, 1, 2, 1, 0), amount_cumsum = c(1, 10, 15, 5, 11, 2, 10, 14, 22, 32, 38, 43, 5, 6, 12, 5, 10, 5), count_cumsum = c(0, 2, 7, 4, 9, 1, 0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 3, 0), condition_met = c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0) )

如果可能的话,我想使用 dplyr 解决方案,但也欢迎使用其他方法。谢谢!

If possible, I would like a dplyr solution, but alternatives are also welcome. Thanks!

更新:作者删除的答案几乎解决了问题:

Update: an answer that was deleted by the author almost got the problem solved:

df %>% group_by(id) %>% mutate( amount_cumsum = purrr::accumulate(.x = amount, .f = ~ if_else(condition = .x < 10, true = .x + .y, false = .y)), count_cumsum = purrr::accumulate(.x = count, .f = ~ if_else(condition = .x < 3, true = .x + .y, false = .y)), condition_met = as.integer(amount_cumsum >= 10 & count_cumsum >= 3) )

或者,或者:

df %>% group_by(id) %>% mutate( amount_cumsum = purrr::accumulate(.x = amount, .f = ~ case_when(.x < 10 ~ .x + .y, TRUE ~ .y)), count_cumsum = purrr::accumulate(.x = count, .f = ~ case_when(.x < 3 ~ .x + .y, TRUE ~ .y)), condition_met = as.integer(amount_cumsum >= 10 & count_cumsum >= 3) )

答案a如果一个变量满足条件,bove会重置累积总和,但是如果另一个条件满足,bove将重置总和。

The answer above resets the cumulative sum if the condition is met for one single variable, but not taking into account if the other condition was met.

推荐答案

使用base-R解决方案进行贡献:

Contributing with a base-R solution:

df$amount_cumsum <- 0 df$count_cumsum <- 0 df$condition_met <- 0 reset = F for (i in 1:nrow(df)) { if (i == 1 | reset) { df$amount_cumsum[i] = df$amount[i] df$count_cumsum[i] = df$count[i] reset = F } else if (df$id[i] != df$id[i-1]) { df$amount_cumsum[i] = df$amount[i] df$count_cumsum[i] = df$count[i] reset = F } else { df$amount_cumsum[i] = df$amount_cumsum[i-1] + df$amount[i] df$count_cumsum[i] = df$count_cumsum[i-1] + df$count[i] } if (df$amount_cumsum[i] >= 10 & df$count_cumsum[i] >= 3) { df$condition_met[i] = 1 reset = T } }

我已经扩展了您的数据集,并根据您的解决方案。基准测试显示,Base-R解决方案比tidyverse解决方案快21倍!

I've expanded your dataset and benchmarked this code against your solution. Benchmark shows the Base-R solution 21 times faster than the tidyverse one!

library(tidyverse) dates = seq(as.Date("2019-01-01"), as.Date("2020-03-04"), by="days") df <- data.frame( date = c(sample(dates, 300), sample(dates, 400), sample(dates, 350)), id = c(rep("A", 300), rep("B", 400), rep("C", 350)), amount = floor(runif(1050, 0, 15)), count = floor(runif(1050, 0, 5)), stringsAsFactors = F ) rbenchmark::benchmark( "Tidy Solution" = { df_tidy <- df %>% group_by(id) %>% nest(data = c(amount, count)) %>% mutate( data_accumulate = purrr::accumulate(.x = data, .f = function(.x, .y) if (max(.x[1]) < 10 | max(.x[2]) < 3) .x + .y else .y) ) %>% unnest(cols = c(data_accumulate)) %>% rename(amount_cumsum = amount, count_cumsum = count) %>% unnest(cols = c(data)) %>% mutate(condition_met = case_when( amount_cumsum >= 10 & count_cumsum >= 3 ~ 1, TRUE ~ 0) ) }, "Base-R Solution" = { df_base <- df df_base$amount_cumsum <- 0 df_base$count_cumsum <- 0 df_base$condition_met <- 0 reset = F # to reset the counters for (i in 1:nrow(df_base)) { if (i == 1 | reset) { df_base$amount_cumsum[i] = df_base$amount[i] df_base$count_cumsum[i] = df_base$count[i] reset = F } else if (df_base$id[i] != df_base$id[i-1]) { df_base$amount_cumsum[i] = df_base$amount[i] df_base$count_cumsum[i] = df_base$count[i] reset = F } else { df_base$amount_cumsum[i] = df_base$amount_cumsum[i-1] + df_base$amount[i] df_base$count_cumsum[i] = df_base$count_cumsum[i-1] + df_base$count[i] } if (df_base$amount_cumsum[i] >= 10 & df_base$count_cumsum[i] >= 3) { df_base$condition_met[i] = 1 reset = T } } }, replications = 100) gc()

test replications elapsed relative user.self sys.self user.child sys.child Base-R Solution 100 3.89 1.000 3.69 0.0 NA NA Tidy Solution 100 84.00 21.594 78.65 0.2 NA NA

更多推荐

如果满足多个条件,则具有重置选项的累积和

本文发布于:2023-10-26 06:58:45,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1529409.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   选项   条件

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!