创建在面板数据组中的条件上重新启动的顺序计数器

编程入门行业动态更新时间:2024-10-09 18:16:29

本文介绍了创建在面板数据组中的条件上重新启动的顺序计数器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述我有一个面板数据集，我想创建一个计数器，随着面板中的每个步骤增加，但在某些情况发生时重新启动。在我的情况下，我使用的是国家年份数据，并希望计算一个事件之间的岁月。这是一个玩具数据集，具有我真正的主要特征：

df< - data.frame（country = rep c（A，B），每个= 5），year = rep（2000：2004，times = 2），event = c（0,0,1,0,0,1,0,0,1 ，0），stringsAsFactors = FALSE）

我想要做的是创建一个计数器在每个国家的一系列观察结果中键入 df $ event 。当我们开始观察每个国家时，时钟从1开始;每年通过增加1;并且每当 df $ event == 1 重新启动。所需的输出是这样的：

国家年事件时钟 1 A 2000 0 1 2 A 2001 0 2 3 A 2002 1 1 4 A 2003 0 2 5 A 2004 0 3 6 B 2000 1 1 7 B 2001 0 2 8 B 2002 0 3 9 B 2003 1 1 10 B 2004 0 2

我尝试使用 getanID 从 splitstackshape 和的一些变体，如果和 ifelse 但到目前为止还没有得到所需的结果。

我已经在我需要这样做的脚本中使用 dplyr ，所以我更喜欢使用它或基于R的解决方案，但是我会感谢任何有用的东西。我的数据集不是很大，所以速度并不重要，但是效率总是很高。

解决方案

c> dplyr 将是：

df％>％ group_by ，idx = cumsum（event == 1L））％>％ mutate（counter = row_number（））％>％ ungroup％>％ select（-idx） #Source：本地数据框架[10 x 4] ＃＃国家年度活动柜台＃1 A 2000 0 1 ＃2 A 2001 0 2 ＃3 A 2002 1 1 ＃4 A 2003 0 2 ＃5 A 2004 0 3 ＃6 B 2000 1 1 ＃7 B 2001 0 2 ＃8 B 2002 0 3 ＃9 B 2003 1 1 ＃10 B 2004 0 2

或使用 data.table ：

library（data.table） setDT（df）[，counter：= seq_len（.N），by = list（country，cumsum（ev ent == 1L））]

编辑： code> group_by（country，idx = cumsum（event == 1L））用于按国家分组和新的分组索引idx。 event == 1L part创建一个逻辑索引，告诉我们列event是否为整数1（ TRUE / FALSE ）。然后，$ code> cumsum（...）从前2行开始为0，接下来3为2，接下来3为2，依此类推。我们使用这个新列（+国家/地区）根据需要对数据进行分组。如果您将最后一个删除到dplyr代码中的管道部件，可以查看。

I have a panel data set for which I would like to create a counter that increases with each step in the panel but restarts whenever some condition occurs. In my case, I'm using country-year data and want to count the passage of years between an event. Here's a toy data set with the key features of my real one:

df <- data.frame(country = rep(c("A","B"), each=5), year=rep(2000:2004, times=2), event=c(0,0,1,0,0,1,0,0,1,0), stringsAsFactors=FALSE)

What I'm looking to do is to create a counter that is keyed to df$event within each country's series of observations. The clock starts at 1 when we start observing each country; it increases by 1 with the passage of each year; and it restarts at 1 whenever df$event==1. The desired output is this:

country year event clock 1 A 2000 0 1 2 A 2001 0 2 3 A 2002 1 1 4 A 2003 0 2 5 A 2004 0 3 6 B 2000 1 1 7 B 2001 0 2 8 B 2002 0 3 9 B 2003 1 1 10 B 2004 0 2

I have tried using getanID from splitstackshape and a few variations of if and ifelse but have failed so far to get the desired result.

I'm already using dplyr in the scripts where I need to do this, so I would prefer a solution that uses it or base R, but I would be grateful for anything that works. My data sets are not massive, so speed is not critical, but efficiency is always a plus.

解决方案

With dplyr that would be:

df %>% group_by(country, idx = cumsum(event == 1L)) %>% mutate(counter = row_number()) %>% ungroup %>% select(-idx) #Source: local data frame [10 x 4] # # country year event counter #1 A 2000 0 1 #2 A 2001 0 2 #3 A 2002 1 1 #4 A 2003 0 2 #5 A 2004 0 3 #6 B 2000 1 1 #7 B 2001 0 2 #8 B 2002 0 3 #9 B 2003 1 1 #10 B 2004 0 2

Or using data.table:

library(data.table) setDT(df)[, counter := seq_len(.N), by = list(country, cumsum(event == 1L))]

Edit: group_by(country, idx = cumsum(event == 1L)) is used to group by country and a new grouping index "idx". The event == 1L part creates a logical index telling us whether the column "event" is an integer 1 or not (TRUE/FALSE). Then, cumsum(...) sums up starting from 0 for the first 2 rows, 1 for the next 3, 2 for the next 3 and so on. We use this new column (+ country) to group the data as needed. You can check it out if you remove the last to pipe-parts in the dplyr code.

更多推荐

创建在面板数据组中的条件上重新启动的顺序计数器

本文发布于:2023-10-17 09:25:26，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1500467.html