我有一个充满二进制变量的表,我想将其简化为分类变量。
I have a table full of binary variables that I would like to condense down to categorical variables.
非常简单,我有一个像这样的数据框:
Very simplistically, I have is a data frame like this:
data <- data.frame(id=c(1,2,3,4,5,6,7,8,9), red=c("1","0","0","0","1","0","0","0","0"),blue=c("0","1","1","1","0","1","1","1","0"),yellow=c("0","0","0","0","0","0","0","0","1")) data id red blue yellow 1 1 1 0 0 2 2 0 1 0 3 3 0 1 0 4 4 0 1 0 5 5 1 0 0 6 6 0 1 0 7 7 0 1 0 8 8 0 1 0 9 9 0 0 1我想得到的是:
id color 1 1 red 2 2 blue 3 3 blue 4 4 blue 5 5 red 6 6 blue 7 7 blue 8 8 blue 9 9 yellowI希望对此有一个非常简单的答案。
I hope there's a really simple answer for this.
推荐答案您可以通过使用名称和 as.ologic 。但是,由于二进制列是要考虑的因素,因此您需要多做一些准备工作。
You can get the values by making use of the column names and as.logical. However, since your "binary" columns are factors, you need to go though a few more hoops:
> apply(data[-1], 1, function(x) names(x)[as.logical(as.numeric(as.character(x)))]) [1] "red" "blue" "blue" "blue" "red" "blue" "blue" "blue" "yellow"将此绑定回第一列( data [1] )以获取所需的输出。
Bind this back with the first column (data[1]) to get the output you want.
cbind(data[1], color = apply(data[-1], 1, function(x) names(x)[as.logical(as.numeric( as.character(x)))])) # id color # 1 1 red # 2 2 blue # 3 3 blue # 4 4 blue # 5 5 red # 6 6 blue # 7 7 blue # 8 8 blue # 9 9 yellow或者,您可以尝试以下操作:
Alternatively, you can try the following:
data[-1] <- lapply(data[-1], function(x) as.numeric(as.character(x))) temp <- subset(cbind(data[1], stack(data[-1])), values == 1, c("id", "ind")) temp[order(temp$id), ]或者,您可以组合使用 dplyr和 tidyr,例如:
Or, you can use a combination of "dplyr" and "tidyr", like this:
library(dplyr) library(tidyr) data %>% group_by(id) %>% mutate_each(funs(an = as.numeric(as.character(.)))) %>% gather(color, val, -id) %>% filter(val == 1) %>% select(-val) %>% arrange(id) # Source: local data frame [9 x 2] # # id color # 1 1 red # 2 2 blue # 3 3 blue # 4 4 blue # 5 5 red # 6 6 blue # 7 7 blue # 8 8 blue # 9 9 yellow更多推荐
将多个二进制列转换为单个分类列
发布评论