如何将一组函数应用于R data.frame中的每个分组变量组(How to apply a set of functions to each group of a grouping variable

编程入门 行业动态 更新时间:2024-10-28 13:16:08
如何将一组函数应用于R data.frame中的每个分组变量组(How to apply a set of functions to each group of a grouping variable in R data.frame)

我需要一步重塑R 中的 data.frame。 简而言之,对象(x1到x6)的值的变化是逐行可见的(从1990年到1995年):

> tab1[1:10, ] # raw data see plot for tab1 id value year 1 x1 7 1990 2 x1 10 1991 3 x1 11 1992 4 x1 7 1993 5 x1 3 1994 6 x1 1 1995 7 x2 6 1990 8 x2 7 1991 9 x2 9 1992 10 x2 5 1993

我能够一步一步地重塑,有人知道如何一步到位吗?

原始数据表1 - 看到所有时间序列的最小值为“0”

步骤1:表2 - 重新缩放每个时间序列,每个时间序列的最小值等于“0”。 所有时间都落在x轴上

第2步:表3 - 在每个时间轴上应用diff()函数。

第3步:表4 - 对每个时间序列应用sort()函数。

我希望这些图片足够清晰,以便了解每一步。

所以决赛桌看起来像这样:

> tab4[1:10, ] id value time 1 x1 -4 1 2 x1 -4 2 3 x1 -2 3 4 x1 1 4 5 x1 3 5 6 x2 -4 1 7 x2 -3 2 8 x2 1 3 9 x2 1 4 10 x2 2 5

# Source data: tab1 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6), value = c(7,10,11,7,3,1,6,7,9,5,2,3,11,9,7,9,1, 0,1,2,2,4,7,4,2,3,1,6,4,2,3,5,4,3,5,6), year = rep(c(1990:1995), times = 6)) tab2 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6), value = c(6,9,10,6,2,0,4,5,7,3,0,1,11,9,7,9,1,0, 0,1,1,3,6,3,1,2,0,5,3,1,0,2,1,0,2,3), year = rep(c(1990:1995), times = 6)) tab3 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5), value = c(3,1,-4,-4,-2,1,2,-4,-3,1,-2,-2,2,-8,-1, 1,0,2,3,-3,1,-2,5,-2,-2,2,-1,-1,2,1), time = rep(c(1:5), times = 6)) tab4 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5), value = c(-4,-4,-2,1,3,-4,-3,1,1,2,-8,-2,-2,-1,2, -3,0,1,2,3,-2,-2,-2,1,5,-1,-1,1,2,2), time = rep(c(1:5), times = 6))

I need to reshape data.frame in R in one step. In short, change of values of objects (x1 to x6) is visible row by row (from 1990 to 1995):

> tab1[1:10, ] # raw data see plot for tab1 id value year 1 x1 7 1990 2 x1 10 1991 3 x1 11 1992 4 x1 7 1993 5 x1 3 1994 6 x1 1 1995 7 x2 6 1990 8 x2 7 1991 9 x2 9 1992 10 x2 5 1993

I am able to do reshaping step by step, does anybody know how do it in one step?

Original data Table 1 - see that minimal value from all timeseries is "0"

Step1: Table 2 - rescale each timeseries that each would have minimal value equal "0". All times fall down on x-axes.

Step2: Table 3 - apply diff() function on each timeline.

Step3: Table 4 - apply sort() function on each timeseries.

I hope the pictures are clear enough for understanding each step.

So final table looks like this:

> tab4[1:10, ] id value time 1 x1 -4 1 2 x1 -4 2 3 x1 -2 3 4 x1 1 4 5 x1 3 5 6 x2 -4 1 7 x2 -3 2 8 x2 1 3 9 x2 1 4 10 x2 2 5

# Source data: tab1 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6), value = c(7,10,11,7,3,1,6,7,9,5,2,3,11,9,7,9,1, 0,1,2,2,4,7,4,2,3,1,6,4,2,3,5,4,3,5,6), year = rep(c(1990:1995), times = 6)) tab2 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6), value = c(6,9,10,6,2,0,4,5,7,3,0,1,11,9,7,9,1,0, 0,1,1,3,6,3,1,2,0,5,3,1,0,2,1,0,2,3), year = rep(c(1990:1995), times = 6)) tab3 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5), value = c(3,1,-4,-4,-2,1,2,-4,-3,1,-2,-2,2,-8,-1, 1,0,2,3,-3,1,-2,5,-2,-2,2,-1,-1,2,1), time = rep(c(1:5), times = 6)) tab4 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5), value = c(-4,-4,-2,1,3,-4,-3,1,1,2,-8,-2,-2,-1,2, -3,0,1,2,3,-2,-2,-2,1,5,-1,-1,1,2,2), time = rep(c(1:5), times = 6))

最满意答案

听起来您想要将一组函数应用于分组变量的每个组。 在R中有很多方法可以做到这一点(从基础R by和tapply到附加软件包,如plyr , data.table和dplyr )。 我一直在学习如何使用包dplyr ,并提出了以下解决方案。

require(dplyr) tab4 = tab1 %>% group_by(id) %>% # group by id mutate(value = value - min(value), value = value - lag(value)) %>% # group min to 0, difference lag 1 na.omit %>% # remove NA caused by lag 1 differencing arrange(id, value) %>% # order by value within each id mutate(time = 1:length(value)) %>% # Make a time variable from 1 to 5 based on current order select(-year) # remove year column to match final OP output

It sounds like you want to apply a set of functions to each group of a grouping variable. There are many ways to do this in R (from base R by and tapply to add-on packages like plyr, data.table, and dplyr). I've been learning how to use package dplyr, and came up with the following solution.

require(dplyr) tab4 = tab1 %>% group_by(id) %>% # group by id mutate(value = value - min(value), value = value - lag(value)) %>% # group min to 0, difference lag 1 na.omit %>% # remove NA caused by lag 1 differencing arrange(id, value) %>% # order by value within each id mutate(time = 1:length(value)) %>% # Make a time variable from 1 to 5 based on current order select(-year) # remove year column to match final OP output

更多推荐

本文发布于:2023-07-30 01:23:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1321351.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:应用于   变量   如何将   函数   data

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!