我试图将数据框中的单个字符"变量拆分为多个因子"变量.
I'm trying to split a single "character" variable in my dataframe into mutiple "factor" variables.
> sampledf=data.frame(vin=c('v1','v2','v3'),features=c('f1:f2:f3','f2:f4:f5','f1:f4:f5')) > sampledf vin features 1 v1 f1:f2:f3 2 v2 f2:f4:f5 3 v3 f1:f4:f5 > desireddf=data.frame(vin=c('v1','v2','v3'),f1=c(1,0,1),f2=c(1,1,0),f3=c(1,0,0),f4=c(0,1,1),f5=c(0,1,1)) > desireddf vin f1 f2 f3 f4 f5 1 v1 1 1 1 0 0 2 v2 0 1 0 1 1 3 v3 1 0 0 1 1我尝试使用 strsplit() 来分隔功能"列
I've tried using strsplit() to separate the "features" column
strsplit(as.character(df$features), ";")但没有运气分解它们.
推荐答案我们可以使用 mtabulate 从 qdapTools 拆分后(strsplit(..>)功能"列.
We can use mtabulate from qdapTools after splitting (strsplit(..) the 'features' column.
library(qdapTools) cbind(sampledf[1],mtabulate(strsplit(as.character(sampledf$features), ':'))) # vin f1 f2 f3 f4 f5 #1 v1 1 1 1 0 0 #2 v2 0 1 0 1 1 #3 v3 1 0 0 1 1或者我们可以使用 library(splitstackshape)
library(splitstackshape) df1 <- cSplit_e(sampledf, 'features', ':', type= 'character', fill=0, drop=TRUE) names(df1) <- sub('.*_', '', names(df1))或者使用base R方法,我们像以前一样split,从strsplit中设置list元素的名称code> 与 'vin' 列,使用 stack 转换为键/值列 'data.frame',获取 table,转置和 cbind 'sampledf' 的第一列.
Or using base R methods, we split as before, set the names of the list elements from the strsplit with 'vin' column, convert to a key/value columns 'data.frame' using stack, get the table, transpose and cbind with the first column of 'sampledf'.
cbind(sampledf[1], t(table(stack(setNames(strsplit(as.character(sampledf$features), ':'), sampledf$vin)))))更多推荐
将一列拆分为多个二进制虚拟列
发布评论