在dplyr mutate调用中添加多个列

编程入门 行业动态 更新时间:2024-10-12 22:31:50
本文介绍了在dplyr mutate调用中添加多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我有一个带有点分隔字符列的数据框:

> set.seed(310366)> tst = data.frame(x = 1:10,y = paste(sample(c(FOO,BAR,BAZ),10,TRUE),。,样本(c(foo bar,baz),10,TRUE),sep =))> tst xy 1 1 BAR.baz 2 2 FOO.foo 3 3 BAZ.baz 4 4 BAZ.foo 5 5 BAZ.bar 6 6 FOO.baz 7 7 BAR.bar 8 8 BAZ.baz

,我想将该列分成两列,其中包含点的任一侧的部分。 str_split_fixed 从包 stringr 可以做得很好。我所有的价值观绝对是由点分开的两部分,所以我可以做:

> require(stringr)> str_split_fixed(tst $ y,\\,2) [,1] [,2] [1,]BARbaz [2,] FOOfoo [3,]BAZbaz [4,]BAZfoo [5,]BAZbar [6,]FOObaz [7,]BARbar

现在我可以只是 cbind 这个数据框架,但是我以为我会弄清楚如何在 dplyr 管道。首先,我认为 mutate 可以在其中执行:

> tst%。%mutate(parts = str_split_fixed(y,\\。,2))错误:错误的结果大小(20),预期为10或1

我可以得到 mutate 来做到这一点:

> tst%。%mutate(part1 = str_split_fixed(y,\\。,2)[,1],part2 = str_split_fixed(y,\\。,2)[,2])xy part1 part2 1 1 BAR.baz BAR baz 2 2 FOO.foo FOO foo 3 3 BAZ.baz BAZ baz 4 4 BAZ.foo BAZ foo 5 5 BAZ.bar BAZ bar 6 6 FOO.baz FOO baz

但这是运行字符串拆分两次。

最好的我可以在一个 dplyr 我只是在写这个问题时才发现...):

> tst%。%do(cbind(。,data.frame(parts = str_split_fixed(。$ y,\\。,2))))xy parts.1 part.2 1 1 BAR.baz BAR baz 2 2 FOO.foo FOO foo 3 3 BAZ.baz BAZ baz 4 4 BAZ.foo BAZ foo 5 5 BAZ.bar BAZ酒吧

这不错,但是在R中丢失了很多可管理的东西的可读性。有没有一个简单的方法,使用我错过的 mutate

解决方案

你可以使用 separate()从 tidyr 与 dplyr :

tst%>%separate(y,c(y1,y2),sep =\ \。,remove = FALSE) xy y1 y2 1 1 BAR.baz BAR baz 2 2 FOO.foo FOO foo 3 3 BAZ.baz BAZ baz 4 4 BAZ.foo BAZ foo 5 5 BAZ.bar BAZ bar 6 6 FOO.baz FOO baz 7 7 BAR.bar BAR bar 8 8 BAZ.baz BAZ baz 9 9 FOO.bar FOO酒吧 10 10 BAR.foo BAR foo

设置 remove = TRUE 将删除列y

I have a data frame with a dot-separated character column:

> set.seed(310366) > tst = data.frame(x=1:10,y=paste(sample(c("FOO","BAR","BAZ"),10,TRUE),".",sample(c("foo","bar","baz"),10,TRUE),sep="")) > tst x y 1 1 BAR.baz 2 2 FOO.foo 3 3 BAZ.baz 4 4 BAZ.foo 5 5 BAZ.bar 6 6 FOO.baz 7 7 BAR.bar 8 8 BAZ.baz

and I want to split that column into two new columns containing the parts on either side of the dot. str_split_fixed from package stringr can do the job quite nicely. All my values are definitely two parts separated by a dot so I can do:

> require(stringr) > str_split_fixed(tst$y,"\\.",2) [,1] [,2] [1,] "BAR" "baz" [2,] "FOO" "foo" [3,] "BAZ" "baz" [4,] "BAZ" "foo" [5,] "BAZ" "bar" [6,] "FOO" "baz" [7,] "BAR" "bar"

Now I could just cbind that to my data frame but I thought I'd figure out how to do that in a dplyr pipeline. First I thought mutate could do it in one:

> tst %.% mutate(parts=str_split_fixed(y,"\\.",2)) Error: wrong result size (20), expected 10 or 1

I can get mutate to do it in two:

> tst %.% mutate(part1=str_split_fixed(y,"\\.",2)[,1], part2=str_split_fixed(y,"\\.",2)[,2]) x y part1 part2 1 1 BAR.baz BAR baz 2 2 FOO.foo FOO foo 3 3 BAZ.baz BAZ baz 4 4 BAZ.foo BAZ foo 5 5 BAZ.bar BAZ bar 6 6 FOO.baz FOO baz

but that's running the string split twice.

"Best" I can do so far in a dplyr way is this (which I only discovered while writing this question...):

> tst %.% do(cbind(.,data.frame(parts=str_split_fixed(.$y,"\\.",2)))) x y parts.1 parts.2 1 1 BAR.baz BAR baz 2 2 FOO.foo FOO foo 3 3 BAZ.baz BAZ baz 4 4 BAZ.foo BAZ foo 5 5 BAZ.bar BAZ bar

which isn't bad, but loses a lot of the readability of piped things in R. Is there a simple approach using mutate that I've missed?

解决方案

You can use separate() from tidyr in combination with dplyr:

tst %>% separate(y, c("y1", "y2"), sep = "\\.", remove=FALSE) x y y1 y2 1 1 BAR.baz BAR baz 2 2 FOO.foo FOO foo 3 3 BAZ.baz BAZ baz 4 4 BAZ.foo BAZ foo 5 5 BAZ.bar BAZ bar 6 6 FOO.baz FOO baz 7 7 BAR.bar BAR bar 8 8 BAZ.baz BAZ baz 9 9 FOO.bar FOO bar 10 10 BAR.foo BAR foo

Setting remove=TRUE will remove column y

更多推荐

在dplyr mutate调用中添加多个列

本文发布于:2023-10-28 04:55:02,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1535580.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:多个   dplyr   mutate

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!