本文介绍了使用dplyr和tidyr计算分组数据的平均值。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我只是在学习R,并试图找到修改我的分组data.frame的方法,以便获得内聚观测的变量value(x+y/2)的平均值和标准差(sd)SQRT((x^2+y^2)/2)。其他(相等)变量(sequence、value1)不应更改。
我使用了subset()和rowMeans(),但我想知道是否有更好的方法使用dplyr和tidyr(可能使用嵌套数据帧?)
我的测试数据。框架如下: id location value sd sequence value1 "anon1" "nose" 5 0.2 "a" 1 "anon2" "body" 4 0.4 "a" 2 "anon3" "left_arm" 3 0.3 "a" 3 "anon3" "right_arm" 5 0.6 "a" 3 "anon4" "head" 4 0.3 "a" 4 "anon5" "left_leg" 2 0.2 "a" 5 "anon5" "right_leg" 1 0.1 "a" 5 我的测试数据的dput输出。帧: myData <- structure(list(ï..id = structure(c(1L, 2L, 3L, 3L, 4L, 5L, 5L ), .Label = c("anon1", "anon2", "anon3", "anon4", "anon5"), class = "factor"), location = structure(c(5L, 1L, 3L, 6L, 2L, 4L, 7L), .Label = c("body", "head", "left_arm", "left_leg", "nose", "right_arm", "right_leg" ), class = "factor"), value = c(5L, 4L, 3L, 5L, 4L, 2L, 1L ), sd = c(0.2, 0.4, 0.3, 0.6, 0.3, 0.2, 0.1), sequence = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "a", class = "factor"), value1 = c(1L, 2L, 3L, 3L, 4L, 5L, 5L)), .Names = c("ï..id", "location", "value", "sd", "sequence", "value1"), class = "data.frame", row.names = c(NA, -7L)) 应该是什么样子: id location value sd sequence value1 "anon1" "nose" 5 0.2 "a" 1 "anon2" "body" 4 0.4 "a" 2 "anon3" "arm" 4 0.47 "a" 3 "anon4" "head" 4 0.3 "a" 4 "anon5" "leg" 1.5 0.15 "a" 5 推荐答案dplyr的group_by和summarise会有所帮助,gsub对字符串变量提供了一些支持:
library(dplyr) myData %>% group_by(id) %>% summarise( location = gsub(".*_", "", location[1]), value = mean(value), sd = mean(sd), sequence = sequence[1], value1 = value1[1] ) #> # A tibble: 5 × 6 #> id location value sd sequence value1 #> <fctr> <chr> <dbl> <dbl> <fctr> <int> #> 1 anon1 nose 5.0 0.20 a 1 #> 2 anon2 body 4.0 0.40 a 2 #> 3 anon3 arm 4.0 0.45 a 3 #> 4 anon4 head 4.0 0.30 a 4 #> 5 anon5 leg 1.5 0.15 a 5或如果id、sequence和value1全部匹配:
myData %>% group_by(id, sequence, value1) %>% summarise( location = gsub(".*_", "", location[1]), value = mean(value), sd = mean(sd)) #> Source: local data frame [5 x 6] #> Groups: id, sequence [?] #> #> id sequence value1 location value sd #> <fctr> <fctr> <int> <chr> <dbl> <dbl> #> 1 anon1 a 1 nose 5.0 0.20 #> 2 anon2 a 2 body 4.0 0.40 #> 3 anon3 a 3 arm 4.0 0.45 #> 4 anon4 a 4 head 4.0 0.30 #> 5 anon5 a 5 leg 1.5 0.15更多推荐
使用dplyr和tidyr计算分组数据的平均值。
发布评论