我有一些数据:
test <- data.frame(A=c("aaabbb", "aaaabb", "aaaabb", "aaaaab", "bbbaaa") )等等.所有元素都是一样的长度,在我拿到它们之前已经排序了.
and so on. All the elements are the same length, and are already sorted before I get them.
我需要创建一个新的排名列,第一",第二",第三",之后的任何内容都可以留空,并且需要考虑平局.所以在上面的例子中,我想得到以下输出:
I need to make a new column of ranks, "First", "Second", "Third", anything after that can be left blank, and it needs to account for ties. So in the above case, I'd like to get the following output:
A B aaabbb First aaaabb Second aaaabb Second aaaaab Third bbbaaa bbbbaa我查看了 rank() 和其他一些使用它的帖子,但我无法让它做我想要的.
I looked at rank() and some other posts that used it, but I wasn't able to get it to do what I was looking for.
推荐答案这个怎么样:
test$B <- match(test$A , unique(test$A)[1:3] ) test A B 1 aaabbb 1 2 aaaabb 2 3 aaaabb 2 4 aaaaab 3 5 bbbaaa NA 6 bbbbaa NA执行此操作的多种方法之一.可能不是最好的,但很容易让人想到并且相当直观.您可以使用 unique,因为您收到的是预先排序的数据.
One of many ways to do this. Possibly not the best, but one that readily springs to mind and is fairly intuitive. You can use unique because you receive the data pre-sorted.
当数据被排序时,另一个值得考虑的合适函数是 rle,尽管在这个例子中它稍微有点迟钝:
As data is sorted another suitable function worth considering is rle, although it's slightly more obtuse in this example:
rnk <- rle(as.integer(df$A))$lengths rnk # [1] 1 2 1 1 1 test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) )rle 计算向量中相等值的运行的长度(以及我们在这里并不真正关心的值) - 所以这再次起作用,因为您的数据已经排序.
rle computes the lengths (and values which we don't really care about here) of runs of equal values in a vector - so again this works because your data are already sorted.
如果你没有在排名第三的项目之后有空格,那就更简单了(也更易读):
And if you don't have to have blanks after the third ranked item it's even simpler (and more readable):
test$B <- rep(1:length(rnk),times=rnk)更多推荐
添加一列排名
发布评论