问题描述
限时送ChatGPT账号..我有一个这样的数据框:
I have a data frame like this:
X <- data.frame(value = c(1,2,3,4),
variable = c("cost", "cost", "reed_cost", "reed_cost"))
我想将变量列一分为二;一列指示变量是否为成本",另一列指示变量是否为芦苇".我似乎无法为拆分找出正确的正则表达式(例如使用 tidyr)
I'd like to split the variable column into two; one column to indicate if the variable is a 'cost' and another column to indicate whether or not the variable is "reed". I cannot seem to figure out the right regex for the split (e.g. using tidyr)
如果我的数据更好,请说:
If my data were something nicer, say:
Y <- data.frame(value = c(1,2,3,4),
variable = c("adjusted_cost", "adjusted_cost", "reed_cost", "reed_cost"))
那么这对于 tidyr 来说是微不足道的:
Then this is trivial with tidyr:
separate(Y, variable, c("Type", "Model"), "_")
和宾果游戏.相反,看起来我需要某种条件语句来拆分_"(如果存在),否则拆分为模式的开头(^").
and bingo. Instead, it looks like I need some kind of conditional statement to split on "_" if it is present, and otherwise split on the start of the pattern ("^").
我试过了:
separate(X, variable, c("Policy-cost", "Reed"), "(?(_)_|^)", perl=TRUE)
但没有运气.我意识到我什至无法成功拆分为空字符串:
but no luck. I realize I cannot even split to an empty string successfully:
separate(X, variable, c("Policy-cost", "Reed"), "^", perl=TRUE)
我该怎么做?
编辑 注意这是一个更大问题的最小例子,其中有许多可能的变量(不仅仅是cost
和reed_cost
) 所以我不想字符串匹配每一个.
Edit Note that this is a minimal example of a larger problem, in which there are many possible variables (not just cost
and reed_cost
) so I do not want to string match each one.
我正在寻找一种解决方案,通过 _
模式(如果存在)拆分任意变量,否则将它们拆分为空白字符串和原始标签.
I am looking for a solution that splits arbitrary variables by the _
pattern if present and otherwise splits them into a blank string and the original label.
我也意识到我可以只搜索 _
的存在,然后手动构建列.如果不那么优雅,那也没关系;似乎应该有一种方法可以使用可以返回空字符串的条件拆分字符串...
I also realize I could just grep for the presence of _
and then construct the columns manually. That's fine if rather less elegant; it seems there should be a way to split on a string using a conditional that can return an empty string...
推荐答案
另一种基于 R 的方法:
Another approach with base R:
cbind(X["value"],
setNames(as.data.frame(t(sapply(strsplit(as.character(X$variable), "_"),
function(x)
if (length(x) == 1) c("", x)
else x))),
c("Policy-cost", "Reed")))
# value Policy-cost Reed
# 1 1 cost
# 2 2 cost
# 3 3 reed cost
# 4 4 reed cost
这篇关于R 中的条件字符串拆分(使用 tidyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论