dplyr自我加入过滤器(dplyr self

编程入门 行业动态 更新时间:2024-10-17 16:18:53
dplyr自我加入过滤器(dplyr self-join with filter)

我想从长格式数据框中的所有其他标记项中减去带有标签“baseline”的行中的值。 使用带有“baseline”子集的left_join,可以通过两个步骤轻松完成此操作。 但是,我无法弄清楚如何将vas_1和vas_diff组合成一个链。

library(dplyr) # Create test data n_users = 5 vas = data_frame( user = rep(letters[1:n_users], each = 3), group = rep(c("baseline", "early", "late" ),n_users), vas = round(rgamma(n_users*3, 10,1.4 )) ) # The above data are given # Assume some other operations are required vas_1 = vas %>% mutate( vas = vas * 2 ) # I want to put the following into one # chain with the above # Use self-join to subtract baseline vas_diff = vas_1 %>% filter(group != "baseline") %>% # Problem is vas_1 here. Using . gives error here # Adding copy = TRUE does not help # left_join(. %>% filter(group == "baseline") , by = c("user")) %>% left_join(vas_1 %>% filter(group == "baseline") , by = c("user")) %>% mutate(vas = vas.x - vas.y) %>% # compute offset select(user, group.x, vas) # remove temporary variables vas_diff

I want to subtract values from a row with label "baseline" from all the otherwise marked items in a long format data frame. It is easy to do this in two steps using a left_join with the "baseline" subset. However, I could not figure out how to combine vas_1 and vas_diff into one chain.

library(dplyr) # Create test data n_users = 5 vas = data_frame( user = rep(letters[1:n_users], each = 3), group = rep(c("baseline", "early", "late" ),n_users), vas = round(rgamma(n_users*3, 10,1.4 )) ) # The above data are given # Assume some other operations are required vas_1 = vas %>% mutate( vas = vas * 2 ) # I want to put the following into one # chain with the above # Use self-join to subtract baseline vas_diff = vas_1 %>% filter(group != "baseline") %>% # Problem is vas_1 here. Using . gives error here # Adding copy = TRUE does not help # left_join(. %>% filter(group == "baseline") , by = c("user")) %>% left_join(vas_1 %>% filter(group == "baseline") , by = c("user")) %>% mutate(vas = vas.x - vas.y) %>% # compute offset select(user, group.x, vas) # remove temporary variables vas_diff

最满意答案

我什么时候使用匿名函数. 应该多次使用:

... %>% (function(df) { ... }) %>% ...

因此,在你的情况下:

vas_diff = vas_1 %>% filter(group != "baseline") %>% (function(df) left_join(df, df %>% filter(group == "baseline") , by = c("user"))) %>% mutate(vas = vas.x - vas.y) %>% # compute offset select(user, group.x, vas)

(这不会产生如上面评论中描述的理想结果,但是它显示了如何使用匿名函数)

但可能你想要这个:

vas_diff = vas_1 %>% left_join( x = filter(., group != "baseline") , y = filter(., group == "baseline") , by = c("user") ) %>% mutate(vas = vas.x - vas.y) %>% # compute offset select(user, group.x, vas) # remove temporary variables

I use anonymous function when . should be used multiple times:

... %>% (function(df) { ... }) %>% ...

Hence, in your case:

vas_diff = vas_1 %>% filter(group != "baseline") %>% (function(df) left_join(df, df %>% filter(group == "baseline") , by = c("user"))) %>% mutate(vas = vas.x - vas.y) %>% # compute offset select(user, group.x, vas)

(which is not going produce desirable result as describe in comments above, but you it shows how to use anonymous function)

but probably you want this:

vas_diff = vas_1 %>% left_join( x = filter(., group != "baseline") , y = filter(., group == "baseline") , by = c("user") ) %>% mutate(vas = vas.x - vas.y) %>% # compute offset select(user, group.x, vas) # remove temporary variables

更多推荐

本文发布于:2023-08-04 13:22:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1416032.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:过滤器   自我   dplyr

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!