R组和聚合

编程入门 行业动态 更新时间:2024-10-25 06:26:44
本文介绍了R组和聚合 - 使用plyr返回组内的相对排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

更新:我有一个数据框架'test',如下所示:

session_id seller_feedback_score 1 1 282470 2 1 275258 3 1 275258 4 1 275258 5 1 37831 6 1 282470 7 1 26 8 1 138351 9 1 321350 10 1 841 11 1 138351 12 1 17263 13 1 282470 14 1 396900 15 1 282470 16 1 282470 17 1 321350 18 1 321350 19 1 321350 20 1 0 21 1 1596 22 7 282505 23 7 275283 24 7 275283 25 7 275283 26 7 37834 27 7 282505 28 7 26 29 7 138359 30 7 321360

和一个代码(使用包plyr),显然应该将'seller_feedback_score'每组session_id:

test< - test%>%group_by(session_id)%>% mutate(seller_feedback_score_rank = dense_rank( -seller_feedback_score))

然而,真正发生的是,R将整个数据帧排列在一起,而没有相关到组(session_id的):

session_id seller_feedback_score seller_feedback_score_rank_2 1 1 282470 5 2 1 275258 7 3 1 275258 7 4 1 275258 7 5 1 37831 11 6 1 282470 5 7 1 26 15 8 1 138351 9 9 1 321350 3 10 1 841 14 11 1 138351 9 12 1 17263 12 13 1 282470 5 14 1 396900 1 15 1 282470 5 16 1 282470 5 17 1 321350 3 18 1 321350 3 19 1 321350 3 20 1 0 16 21 1 1596 13 22 7 282505 4 23 7 275283 6 24 7 275283 6 25 7 275283 6 26 7 37834 10 27 7 282505 4 28 7 26 15 29 7 138359 8 30 7 321360 2

我c通过计算唯一的seller_feedback_score_rank值,而不是令人惊讶的是它等于最高等级值。如果有人能够重现和帮助,我将不胜感激。谢谢

解决方案

一个选项:

library(dplyr) df%>%group_by(session_id)%>% mutate(rank = dense_rank(-seller_feedback_score)) / pre>

dense_rank 是喜欢min_rank,但排名之间没有差距,所以我否定了seller_feedback_score列要将其变成像max_rank这样的东西(在dplyr中不存在)。

如果你希望排名差距达到21,你的最低情况下,您可以使用 min_rank 而不是 dense_rank :

library(dplyr) df%>%group_by(session_id)%>% mutate(rank = min_rank(-seller_feedback_score))

UPDATE: I have a data frame 'test' that look like this:

session_id seller_feedback_score 1 1 282470 2 1 275258 3 1 275258 4 1 275258 5 1 37831 6 1 282470 7 1 26 8 1 138351 9 1 321350 10 1 841 11 1 138351 12 1 17263 13 1 282470 14 1 396900 15 1 282470 16 1 282470 17 1 321350 18 1 321350 19 1 321350 20 1 0 21 1 1596 22 7 282505 23 7 275283 24 7 275283 25 7 275283 26 7 37834 27 7 282505 28 7 26 29 7 138359 30 7 321360

and a code (using package plyr) that apparently should rank the 'seller_feedback_score' within each group of session_id:

test <- test %>% group_by(session_id) %>% mutate(seller_feedback_score_rank = dense_rank(-seller_feedback_score))

however, what is really happening is that R rank the entire data frame together without relating to the groups (session_id's):

session_id seller_feedback_score seller_feedback_score_rank_2 1 1 282470 5 2 1 275258 7 3 1 275258 7 4 1 275258 7 5 1 37831 11 6 1 282470 5 7 1 26 15 8 1 138351 9 9 1 321350 3 10 1 841 14 11 1 138351 9 12 1 17263 12 13 1 282470 5 14 1 396900 1 15 1 282470 5 16 1 282470 5 17 1 321350 3 18 1 321350 3 19 1 321350 3 20 1 0 16 21 1 1596 13 22 7 282505 4 23 7 275283 6 24 7 275283 6 25 7 275283 6 26 7 37834 10 27 7 282505 4 28 7 26 15 29 7 138359 8 30 7 321360 2

I checked this by counting the unique 'seller_feedback_score_rank' values and not surprisingly it equals to the highest rank value. I'd appreciate if someone could reproduce and help. thanks

解决方案

One option:

library(dplyr) df %>% group_by(session_id) %>% mutate(rank = dense_rank(-seller_feedback_score))

dense_rank is "like min_rank, but with no gaps between ranks" so I negated the seller_feedback_score column in order to turn it into something like max_rank (which doesn't exist in dplyr).

If you want the ranks with gaps so that you reach 21 for the lowest in your case, you can use min_rank instead of dense_rank:

library(dplyr) df %>% group_by(session_id) %>% mutate(rank = min_rank(-seller_feedback_score))

更多推荐

R组和聚合

本文发布于:2023-11-12 12:46:22,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1581531.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!