R数据中的新条件列(New conditioned column in R data)

编程入门 行业动态 更新时间:2024-10-23 23:26:17
R数据中的新条件列(New conditioned column in R data)

我正在进行数据挖掘课程,需要使用randomForest操纵一些数据来完成所需的任务。 V1,V2和V3是列名。 如果V1 = A且V2 = 2,我希望R将“Eureka”输出到新列V4的相应行。 我希望V4中的其他值设置为“NOPE”。 实际数据集有300000行和6列。 这可能看起来很奇怪,但如果我能学会如何做到这一点,我的问题就会得到解决。 谢谢。

V1 V2 V3 A 1 4 A 1 8 A 2 4 A 2 8 C 1 10 C 1 9 C 2 10 C 2 9 V1 V2 V3 V4 A 1 4 NOPE A 1 8 NOPE A 2 5 Eureka A 2 3 Eureka C 1 10 NOPE C 1 8 NOPE C 2 10 NOPE C 2 4 NOPE

以下代码不起作用。

`for(g in 1:8){ if(data$V1[g]=="A"&data$V2[g]==2){ data$V4[g]=Eureka }else{ data$V4[g]="NOPE" } }`

I'm taking a data mining course and need to manipulate some data to do desired task using randomForest. V1, V2, and V3 are the column names. If V1=A and V2=2, I want R to output "Eureka" to the corresponding row of a new column V4. I want the other values in V4 to be set to "NOPE". The actual data set has 300000 rows and 6 columns. This may seem strange but if I can learn how to do this my problem will be solved. Thanks.

V1 V2 V3 A 1 4 A 1 8 A 2 4 A 2 8 C 1 10 C 1 9 C 2 10 C 2 9 V1 V2 V3 V4 A 1 4 NOPE A 1 8 NOPE A 2 5 Eureka A 2 3 Eureka C 1 10 NOPE C 1 8 NOPE C 2 10 NOPE C 2 4 NOPE

The following code does NOT work.

`for(g in 1:8){ if(data$V1[g]=="A"&data$V2[g]==2){ data$V4[g]=Eureka }else{ data$V4[g]="NOPE" } }`

最满意答案

我们可以使用数字索引或ifelse来创建“V4”列。 V1=='A' & V2==2给出逻辑索引( TRUE/FALSE )。 添加1 ,将逻辑向量强制转换为二进制( 1/0 ),并给出对应于TRUE/FALSE 2/1 。 此数值可用作索引,用“NOPE”/“Eureka”替换它。

df$V4 <- with(df, c('NOPE', 'Eureka')[(V1=='A' & V2==2)+1]) df # V1 V2 V3 V4 #1 A 1 4 NOPE #2 A 1 8 NOPE #3 A 2 4 Eureka #4 A 2 8 Eureka #5 C 1 10 NOPE #6 C 1 9 NOPE #7 C 2 10 NOPE #8 C 2 9 NOPE

或使用ifelse

df$V4 <- with(df, ifelse(V1=='A' & V2==2, 'Eureka', 'NOPE'))

另一种选择是data.table 。 将“data.frame”转换为“data.table”( setDT )。 创建值为NOPE列( V4 )。 符合条件( V1=='A' & V2==2 )的V4行被分配给Eureka

library(data.table) setDT(df)[,V4:='NOPE'][V1=='A' & V2==2, V4:='Eureka'][]

关于代码中的错误,应引用“Eureka”。 最好使用vectorized方法而不是循环。

for(g in 1:8){ if(df$V1[g]=='A' & df$V2[g]==2){ df$V4[g] <- 'Eureka' } else{ df$V4[g] <- 'NOPE' } } df$V4 #[1] "NOPE" "NOPE" "Eureka" "Eureka" "NOPE" "NOPE" "NOPE" "NOPE"

数据

df <- structure(list(V1 = c("A", "A", "A", "A", "C", "C", "C", "C"), V2 = c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), V3 = c(4L, 8L, 4L, 8L, 10L, 9L, 10L, 9L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -8L))

We could use either numeric index or ifelse to create the "V4" column. V1=='A' & V2==2 gives a logical index (TRUE/FALSE). Adding 1, coerces the logical vector to binary (1/0) and gives 2/1 corresponding to TRUE/FALSE. This numeric values can be used as index to replace it with `NOPE'/'Eureka'.

df$V4 <- with(df, c('NOPE', 'Eureka')[(V1=='A' & V2==2)+1]) df # V1 V2 V3 V4 #1 A 1 4 NOPE #2 A 1 8 NOPE #3 A 2 4 Eureka #4 A 2 8 Eureka #5 C 1 10 NOPE #6 C 1 9 NOPE #7 C 2 10 NOPE #8 C 2 9 NOPE

Or using ifelse

df$V4 <- with(df, ifelse(V1=='A' & V2==2, 'Eureka', 'NOPE'))

Another option would be data.table. Convert the "data.frame" to "data.table" (setDT). Create column (V4) with value NOPE. The rows of V4 that meets the condition (V1=='A' & V2==2) is assigned to Eureka

library(data.table) setDT(df)[,V4:='NOPE'][V1=='A' & V2==2, V4:='Eureka'][]

Regarding the error in your code, 'Eureka' should be quoted. It is better to use vectorized methods rather than loops.

for(g in 1:8){ if(df$V1[g]=='A' & df$V2[g]==2){ df$V4[g] <- 'Eureka' } else{ df$V4[g] <- 'NOPE' } } df$V4 #[1] "NOPE" "NOPE" "Eureka" "Eureka" "NOPE" "NOPE" "NOPE" "NOPE"

data

df <- structure(list(V1 = c("A", "A", "A", "A", "C", "C", "C", "C"), V2 = c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), V3 = c(4L, 8L, 4L, 8L, 10L, 9L, 10L, 9L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -8L))

更多推荐

本文发布于:2023-08-04 16:05:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1417701.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:条件   数据   conditioned   column   data

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!