我有一个包含超过 400.000 个观察值的数据框,我正在尝试向其中添加一列,其值取决于另一列,有时取决于多个列.
I have a data frame with more than 400.000 observations and I'm trying to add a column to it which its values depend on another column and sometimes multiple ones.
这是我正在尝试做的一个更简单的例子:
Here is a simpler example of what I'm trying to do :
# Creating a data frame M <- data.frame(c("A","B","C"),c(5,100,60)) names(M) <- c("Letter","Number") #adding a column M$Size <- NA # if Number <= 50 Size is small, # if Number is between 50 and 70, Size is Medium # if Number is Bigger than 70, Size is Big ifelse (M$Number <=50, M$Size <-"Small", ifelse(M$Number <= 70, M$Size <- "Medium", M$Size <- "Big" ))当我运行代码时,我得到的输出是:
When I run the Code, the output I get is :
[1] "Small" "Big" "Medium"但 M 中的大小"列始终是 ifelse 函数中的最后一个条件:
But the "Size" column in M is always the last condition in the ifelse function :
> print (M) Letter Number Size 1 A 5 Big 2 B 100 Big 3 C 60 Big我想要的结果:
> print (M) Letter Number Size 1 A 5 Small 2 B 100 Big 3 C 60 Medium我可以通过设置每个条件的子集来解决问题subset 并使用 rbind 得到我想要的结果,但代码会很长,因为原始数据框我工作量很大,运行需要更多时间.所以我想知道如何解决这个问题?
I can solve the problem by subsetting each conditionsubset and using rbind to get the result I want but the code will be very long and since the original data frame I'm working on is big, it'll take more time to run. So I'm wondering how can I fix this issue ?
推荐答案这会帮助你 -
# Creating a data frame M <- data.frame(c("A","B","C"),c(5,100,60)) names(M) <- c("Letter","Number") #adding a column # if Number <= 50 Size is small, # if Number is between 50 and 70, Size is Medium # if Number is Bigger than 70, Size is Big # M$Size[M$Number <= 50] <- "Small" # Edit: No need to subset "Small" M$Size <- "Small" M$Size[M$Number >50 & M$Number<70] <- "Medium" M$Size[M$Number > 70] <- "Big" # Letter Number Size # 1 A 5 Small # 2 B 100 Big # 3 C 60 Medium在 R-Fiddle
更多推荐
使用 ifelse 添加具有条件值的新列
发布评论