我想要做的是创建一个计数器在每个国家的一系列观察结果中键入 df $ event 。当我们开始观察每个国家时,时钟从1开始;每年通过增加1;并且每当 df $ event == 1 重新启动。所需的输出是这样的:
国家年事件时钟 1 A 2000 0 1 2 A 2001 0 2 3 A 2002 1 1 4 A 2003 0 2 5 A 2004 0 3 6 B 2000 1 1 7 B 2001 0 2 8 B 2002 0 3 9 B 2003 1 1 10 B 2004 0 2我尝试使用 getanID 从 splitstackshape 和的一些变体,如果和 ifelse 但到目前为止还没有得到所需的结果。
我已经在我需要这样做的脚本中使用 dplyr ,所以我更喜欢使用它或基于R的解决方案,但是我会感谢任何有用的东西。我的数据集不是很大,所以速度并不重要,但是效率总是很高。
解决方案c> dplyr 将是:
df%>% group_by ,idx = cumsum(event == 1L))%>% mutate(counter = row_number())%>% ungroup%>% select(-idx) #Source:本地数据框架[10 x 4] ##国家年度活动柜台#1 A 2000 0 1 #2 A 2001 0 2 #3 A 2002 1 1 #4 A 2003 0 2 #5 A 2004 0 3 #6 B 2000 1 1 #7 B 2001 0 2 #8 B 2002 0 3 #9 B 2003 1 1 #10 B 2004 0 2或使用 data.table :
library(data.table) setDT(df)[,counter:= seq_len(.N),by = list(country,cumsum(ev ent == 1L))]
编辑: code> group_by(country,idx = cumsum(event == 1L))用于按国家分组和新的分组索引idx。 event == 1L part创建一个逻辑索引,告诉我们列event是否为整数1( TRUE / FALSE )。然后,$ code> cumsum(...)从前2行开始为0,接下来3为2,接下来3为2,依此类推。我们使用这个新列(+国家/地区)根据需要对数据进行分组。如果您将最后一个删除到dplyr代码中的管道部件,可以查看。
I have a panel data set for which I would like to create a counter that increases with each step in the panel but restarts whenever some condition occurs. In my case, I'm using country-year data and want to count the passage of years between an event. Here's a toy data set with the key features of my real one:
df <- data.frame(country = rep(c("A","B"), each=5), year=rep(2000:2004, times=2), event=c(0,0,1,0,0,1,0,0,1,0), stringsAsFactors=FALSE)What I'm looking to do is to create a counter that is keyed to df$event within each country's series of observations. The clock starts at 1 when we start observing each country; it increases by 1 with the passage of each year; and it restarts at 1 whenever df$event==1. The desired output is this:
country year event clock 1 A 2000 0 1 2 A 2001 0 2 3 A 2002 1 1 4 A 2003 0 2 5 A 2004 0 3 6 B 2000 1 1 7 B 2001 0 2 8 B 2002 0 3 9 B 2003 1 1 10 B 2004 0 2I have tried using getanID from splitstackshape and a few variations of if and ifelse but have failed so far to get the desired result.
I'm already using dplyr in the scripts where I need to do this, so I would prefer a solution that uses it or base R, but I would be grateful for anything that works. My data sets are not massive, so speed is not critical, but efficiency is always a plus.
解决方案With dplyr that would be:
df %>% group_by(country, idx = cumsum(event == 1L)) %>% mutate(counter = row_number()) %>% ungroup %>% select(-idx) #Source: local data frame [10 x 4] # # country year event counter #1 A 2000 0 1 #2 A 2001 0 2 #3 A 2002 1 1 #4 A 2003 0 2 #5 A 2004 0 3 #6 B 2000 1 1 #7 B 2001 0 2 #8 B 2002 0 3 #9 B 2003 1 1 #10 B 2004 0 2Or using data.table:
library(data.table) setDT(df)[, counter := seq_len(.N), by = list(country, cumsum(event == 1L))]
Edit: group_by(country, idx = cumsum(event == 1L)) is used to group by country and a new grouping index "idx". The event == 1L part creates a logical index telling us whether the column "event" is an integer 1 or not (TRUE/FALSE). Then, cumsum(...) sums up starting from 0 for the first 2 rows, 1 for the next 3, 2 for the next 3 and so on. We use this new column (+ country) to group the data as needed. You can check it out if you remove the last to pipe-parts in the dplyr code.
更多推荐
创建在面板数据组中的条件上重新启动的顺序计数器
发布评论