我有一个数据集,希望使用 mutate 和 case_when .但是,我想在达到某种逻辑后停止/过滤突变.
I have a dataset that I wish to use mutate and case_when. However, I want to stop/filter the mutate after a certain logic is reached.
数据:
set.seed(2020) df <- tibble( customer = seq(1:5), start_date = sample(seq(as.Date('2020-01-01'), as.Date('2020-12-31'), by = "day"), 5), termination_date = c(sample(seq(as.Date('2020-06-01'), as.Date('2021-02-28'), by = "day"), 4), NA))参数:
obs_start <- as.Date("2020-08-01") obs_interval <- months(1) lead_time <- weeks(1) first_obs <- obs_start - lead_time last_obs <- first_obs %m+% months(6) obs_seq <- seq(first_obs, last_obs, by = "month")变异 + case_when
df %>% filter(termination_date > first_obs | is.na(termination_date)) %>% crossing(obs_seq) %>% mutate(is_churn = case_when( termination_date >= obs_seq & termination_date < (obs_seq + obs_interval) ~ "Yes", TRUE ~ "No" )) customer start_date termination_date obs_seq is_churn <int> <date> <date> <date> <chr> 1 1 2020-08-23 2021-02-28 2020-07-25 No 2 1 2020-08-23 2021-02-28 2020-08-25 No 3 1 2020-08-23 2021-02-28 2020-09-25 No 4 1 2020-08-23 2021-02-28 2020-10-25 No 5 1 2020-08-23 2021-02-28 2020-11-25 No 6 1 2020-08-23 2021-02-28 2020-12-25 No 7 1 2020-08-23 2021-02-28 2021-01-25 No 8 2 2020-03-27 2020-11-11 2020-07-25 No 9 2 2020-03-27 2020-11-11 2020-08-25 No 10 2 2020-03-27 2020-11-11 2020-09-25 No 11 2 2020-03-27 2020-11-11 2020-10-25 Yes 12 2 2020-03-27 2020-11-11 2020-11-25 No 13 2 2020-03-27 2020-11-11 2020-12-25 No 14 2 2020-03-27 2020-11-11 2021-01-25 No 15 3 2020-01-22 2020-11-17 2020-07-25 No 16 3 2020-01-22 2020-11-17 2020-08-25 No 17 3 2020-01-22 2020-11-17 2020-09-25 No 18 3 2020-01-22 2020-11-17 2020-10-25 Yes 19 3 2020-01-22 2020-11-17 2020-11-25 No 20 3 2020-01-22 2020-11-17 2020-12-25 No 21 3 2020-01-22 2020-11-17 2021-01-25 No 22 4 2020-08-03 2020-12-15 2020-07-25 No 23 4 2020-08-03 2020-12-15 2020-08-25 No 24 4 2020-08-03 2020-12-15 2020-09-25 No 25 4 2020-08-03 2020-12-15 2020-10-25 No 26 4 2020-08-03 2020-12-15 2020-11-25 Yes 27 4 2020-08-03 2020-12-15 2020-12-25 No 28 4 2020-08-03 2020-12-15 2021-01-25 No 29 5 2020-11-16 NA 2020-07-25 No 30 5 2020-11-16 NA 2020-08-25 No 31 5 2020-11-16 NA 2020-09-25 No 32 5 2020-11-16 NA 2020-10-25 No 33 5 2020-11-16 NA 2020-11-25 No 34 5 2020-11-16 NA 2020-12-25 No 35 5 2020-11-16 NA 2021-01-25 No在每组客户中,我要删除 is_churn ==是" 之后的所有行下面是所需的输出:
Within each group of customer, I want to remove all rows after is_churn == "Yes" Below is the desired output:
# A tibble: 27 x 5 customer start_date termination_date obs_seq is_churn <int> <chr> <chr> <chr> <chr> 1 1 2020-08-23 2021-02-28 2020-07-25 No 2 1 2020-08-23 2021-02-28 2020-08-25 No 3 1 2020-08-23 2021-02-28 2020-09-25 No 4 1 2020-08-23 2021-02-28 2020-10-25 No 5 1 2020-08-23 2021-02-28 2020-11-25 No 6 1 2020-08-23 2021-02-28 2020-12-25 No 7 1 2020-08-23 2021-02-28 2021-01-25 No 8 2 2020-03-27 2020-11-11 2020-07-25 No 9 2 2020-03-27 2020-11-11 2020-08-25 No 10 2 2020-03-27 2020-11-11 2020-09-25 No 11 2 2020-03-27 2020-11-11 2020-10-25 Yes 12 3 2020-01-22 2020-11-17 2020-07-25 No 13 3 2020-01-22 2020-11-17 2020-08-25 No 14 3 2020-01-22 2020-11-17 2020-09-25 No 15 3 2020-01-22 2020-11-17 2020-10-25 Yes 16 4 2020-08-03 2020-12-15 2020-07-25 No 17 4 2020-08-03 2020-12-15 2020-08-25 No 18 4 2020-08-03 2020-12-15 2020-09-25 No 19 4 2020-08-03 2020-12-15 2020-10-25 No 20 4 2020-08-03 2020-12-15 2020-11-25 Yes 21 5 2020-11-16 NA 2020-07-25 No 22 5 2020-11-16 NA 2020-08-25 No 23 5 2020-11-16 NA 2020-09-25 No 24 5 2020-11-16 NA 2020-10-25 No 25 5 2020-11-16 NA 2020-11-25 No 26 5 2020-11-16 NA 2020-12-25 No 27 5 2020-11-16 NA 2021-01-25 No客户1不变,因为终止日期不在观察期内.
Customer 1 is unchanged because the termination_date doesn't fall into the observation period.
推荐答案为每个客户定期地 filter 行:
library(dplyr) df %>% filter(termination_date > first_obs | is.na(termination_date)) %>% tidyr::crossing(obs_seq) %>% mutate(is_churn = case_when( termination_date >= obs_seq & termination_date < (obs_seq + obs_interval) ~ "Yes", TRUE ~ "No" )) %>% group_by(customer) %>% filter(if(any(is_churn == 'Yes')) row_number() <= match('Yes', is_churn) else TRUE) %>% ungroup # customer start_date termination_date obs_seq is_churn #1 1 2020-08-23 2021-02-28 2020-07-25 No #2 1 2020-08-23 2021-02-28 2020-08-25 No #3 1 2020-08-23 2021-02-28 2020-09-25 No #4 1 2020-08-23 2021-02-28 2020-10-25 No #5 1 2020-08-23 2021-02-28 2020-11-25 No #6 1 2020-08-23 2021-02-28 2020-12-25 No #7 1 2020-08-23 2021-02-28 2021-01-25 No #8 2 2020-03-27 2020-11-11 2020-07-25 No #9 2 2020-03-27 2020-11-11 2020-08-25 No #10 2 2020-03-27 2020-11-11 2020-09-25 No #11 2 2020-03-27 2020-11-11 2020-10-25 Yes #12 3 2020-01-22 2020-11-17 2020-07-25 No #13 3 2020-01-22 2020-11-17 2020-08-25 No #14 3 2020-01-22 2020-11-17 2020-09-25 No #15 3 2020-01-22 2020-11-17 2020-10-25 Yes #16 4 2020-08-03 2020-12-15 2020-07-25 No #17 4 2020-08-03 2020-12-15 2020-08-25 No #18 4 2020-08-03 2020-12-15 2020-09-25 No #19 4 2020-08-03 2020-12-15 2020-10-25 No #20 4 2020-08-03 2020-12-15 2020-11-25 Yes #21 5 2020-11-16 <NA> 2020-07-25 No #22 5 2020-11-16 <NA> 2020-08-25 No #23 5 2020-11-16 <NA> 2020-09-25 No #24 5 2020-11-16 <NA> 2020-10-25 No #25 5 2020-11-16 <NA> 2020-11-25 No #26 5 2020-11-16 <NA> 2020-12-25 No #27 5 2020-11-16 <NA> 2021-01-25 No对于客户,如果 churn 的任何值为'Yes',请选择该行之前的所有行,包括其自身.如果没有 churn ='Yes',请选择所有行.
For a customer if any value of churn is 'Yes' select all rows before that including itself. If there is no churn = 'Yes' select all rows.
更多推荐
有条件地停止突变
发布评论