如何使用dplyr融合和投射数据帧?

编程入门 行业动态 更新时间:2024-10-09 19:13:29
本文介绍了如何使用dplyr融合和投射数据帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

最近,我正在使用dplyr进行所有数据操作,这是一个非常好的工具。但是,我无法使用dplyr融化或投射数据帧。有什么办法吗?现在,我正在为此目的使用reshape2。

Recently I am doing all my data manipulations using dplyr and it is an excellent tool for that. However I am unable to melt or cast a data frame using dplyr. Is there any way to do that? Right now I am using reshape2 for this purpose.

我想要 dplyr解决方案:

I want 'dplyr' solution for:

require(reshape2) data(iris) dat <- melt(iris,id.vars="Species")

推荐答案

reshape2 的后继者是 tidyr 。 melt()和 dcast()的等效值为 gather()和 spread()。这样,与您的代码等效的就是

The successor to reshape2 is tidyr. The equivalent of melt() and dcast() are gather() and spread() respectively. The equivalent to your code would then be

library(tidyr) data(iris) dat <- gather(iris, variable, value, -Species)

如果您有 magrittr 导入后,您可以像 dplyr 中那样使用管道运算符,即,写

If you have magrittr imported you can use the pipe operator like in dplyr, i.e. write

dat <- iris %>% gather(variable, value, -Species)

请注意,与 melt()不同,您需要显式指定变量和值名称。我发现 gather()的语法非常方便,因为您可以只指定要转换为长格式的列,也可以指定要保留在列中的列。新数据框的前缀是-(就像上面的物种一样),其键入速度比 melt()快一点。但是,我注意到至少在我的计算机上, tidyr 可能比 reshape2 慢得多。

Note that you need to specify the variable and value names explicitly, unlike in melt(). I find the syntax of gather() quite convenient, because you can just specify the columns you want to be converted to long format, or specify the ones you want to remain in the new data frame by prefixing them with '-' (just like for Species above), which is a bit faster to type than in melt(). However, I've noticed that on my machine at least, tidyr can be noticeably slower than reshape2.

编辑为回复@hadley在下面的评论,我在PC上发布了一些比较这两个功能的计时信息。

Edit In reply to @hadley 's comment below, I'm posting some timing info comparing the two functions on my PC.

library(microbenchmark) microbenchmark( melt = melt(iris,id.vars="Species"), gather = gather(iris, variable, value, -Species) ) # Unit: microseconds # expr min lq median uq max neval # melt 278.829 290.7420 295.797 320.5730 389.626 100 # gather 536.974 552.2515 567.395 683.2515 1488.229 100 set.seed(1) iris1 <- iris[sample(1:nrow(iris), 1e6, replace = T), ] system.time(melt(iris1,id.vars="Species")) # user system elapsed # 0.012 0.024 0.036 system.time(gather(iris1, variable, value, -Species)) # user system elapsed # 0.364 0.024 0.387 sessionInfo() # R version 3.1.1 (2014-07-10) # Platform: x86_64-pc-linux-gnu (64-bit) # # locale: # [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C # [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 # [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 # [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C # [9] LC_ADDRESS=C LC_TELEPHONE=C # [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # other attached packages: # [1] reshape2_1.4 microbenchmark_1.3-0 magrittr_1.0.1 # [4] tidyr_0.1 # # loaded via a namespace (and not attached): # [1] assertthat_0.1 dplyr_0.2 parallel_3.1.1 plyr_1.8.1 Rcpp_0.11.2 # [6] stringr_0.6.2 tools_3.1.1

更多推荐

如何使用dplyr融合和投射数据帧?

本文发布于:2023-11-23 06:39:23,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1620464.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:如何使用   数据   dplyr

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!