塑造相同DF中的区间日期和范围日期(Shaping Interval Dates and Range Dates in Same DF)

我试图用R计算一个人住在无家可归的庇护所里的时间。无家可归的庇护所有两种不同类型的登记入住，一种用于过夜，另一种用于长期住宿。我想塑造数据以获得EntryDate和ExitDate，每次逗留至少休息一天。

以下是数据目前的样子：

PersonalID EntryDate ExitDate 1 2016-12-01 2016-12-02 1 2016-12-03 2016-12-04 1 2016-12-16 2016-12-17 1 2016-12-17 2016-12-18 1 2016-12-18 2016-12-19 2 2016-10-01 2016-10-20 2 2016-10-21 2016-10-22 3 2016-09-01 2016-09-02 3 2016-09-20 2016-09-21

最终，我试图让上述日期代表连续范围来计算参加者的总住院时间。

例如，上述数据将变成：

PersonalID EntryDate ExitDate 1 2016-12-01 2016-12-04 1 2016-12-16 2016-12-19 2 2016-10-01 2016-10-22 3 2016-09-01 2016-09-02 3 2016-09-20 2016-09-21

I'm trying to calculate how long one person stays in a homeless shelter using R. The homeless shelter has two different types of check-ins, one for overnight and another for a long-term. I would like to shape the data to get an EntryDate and ExitDate for every stay which does not have at least a one day break.

Here are what the data currently look like:

PersonalID EntryDate ExitDate 1 2016-12-01 2016-12-02 1 2016-12-03 2016-12-04 1 2016-12-16 2016-12-17 1 2016-12-17 2016-12-18 1 2016-12-18 2016-12-19 2 2016-10-01 2016-10-20 2 2016-10-21 2016-10-22 3 2016-09-01 2016-09-02 3 2016-09-20 2016-09-21

Ultimately, I'm trying to get the above date to represent continuous ranges to calculate total length of stay by participant.

For example, the above data would become:

PersonalID EntryDate ExitDate 1 2016-12-01 2016-12-04 1 2016-12-16 2016-12-19 2 2016-10-01 2016-10-22 3 2016-09-01 2016-09-02 3 2016-09-20 2016-09-21

最满意答案

这是一个丑陋的解决方案。这可能可以做更干净的事情......但它有效。这个解决方案应该可以用真实的数据进行调试（我在你的例子中添加了一行以便有更多不同的情况）

d <- read.table(text = ' PersonalID EntryDate ExitDate 1 2016-12-01 2016-12-02 1 2016-12-03 2016-12-04 1 2016-12-16 2016-12-17 1 2016-12-17 2016-12-18 1 2016-12-18 2016-12-19 2 2016-10-01 2016-10-20 2 2016-10-21 2016-10-22 3 2016-09-01 2016-09-02 3 2016-09-20 2016-09-21 4 2016-09-20 2016-09-21 ', header = TRUE) #' transorm in Date format d$EntryDate <- as.Date(as.character(d$EntryDate)) d$ExitDate <- as.Date(as.character(d$ExitDate)) summary(d) #' Reorder to be sure that the ExitDate / Entry date are in chronological order d <- d[order(d$PersonalID, d$EntryDate),] #' Add a column that will store the number of days between one exit and the next entry d$nbdays <- 9999 # Split to have a list with dataframe for each ID d <- split(d, d$PersonalID) d for(i in 1:length(d)) { # Compute number of days between one exit and the next entry (only if there are # more than one entry) if(nrow(d[[i]])>1) { d[[i]][-1,"nbdays"] <- d[[i]][2:nrow(d[[i]]),"EntryDate"] - d[[i]][1:(nrow(d[[i]])-1),"ExitDate"] } x <- d[[i]] # store a copy of the data to lighten the syntax # Entry dates for which the previous exit is higher than 1 day (including the first one) entr <- x[x$nbdays>1,"EntryDate"] # Exit dates just before cases where nbdays are > 1 and includes the last exit date. # We use unique to avoid picking 2 times the last exit whichexist <- unique(c(c(which(x$nbdays > 1)-1)[-1],nrow(x))) exit <- x[whichexist,"ExitDate"] d[[i]] <- data.frame( PersonalID = x[1,1], EntryDate = entr, ExitDate = exit ) } # paste the elements of this list into one data.frame do.call(rbind, d)

Here is an ugly solution. It is probably possible to do something more clean... But it works. This solution should alaso be debugged with real data (I have added one line to your exaple to have more different situations)

d <- read.table(text = ' PersonalID EntryDate ExitDate 1 2016-12-01 2016-12-02 1 2016-12-03 2016-12-04 1 2016-12-16 2016-12-17 1 2016-12-17 2016-12-18 1 2016-12-18 2016-12-19 2 2016-10-01 2016-10-20 2 2016-10-21 2016-10-22 3 2016-09-01 2016-09-02 3 2016-09-20 2016-09-21 4 2016-09-20 2016-09-21 ', header = TRUE) #' transorm in Date format d$EntryDate <- as.Date(as.character(d$EntryDate)) d$ExitDate <- as.Date(as.character(d$ExitDate)) summary(d) #' Reorder to be sure that the ExitDate / Entry date are in chronological order d <- d[order(d$PersonalID, d$EntryDate),] #' Add a column that will store the number of days between one exit and the next entry d$nbdays <- 9999 # Split to have a list with dataframe for each ID d <- split(d, d$PersonalID) d for(i in 1:length(d)) { # Compute number of days between one exit and the next entry (only if there are # more than one entry) if(nrow(d[[i]])>1) { d[[i]][-1,"nbdays"] <- d[[i]][2:nrow(d[[i]]),"EntryDate"] - d[[i]][1:(nrow(d[[i]])-1),"ExitDate"] } x <- d[[i]] # store a copy of the data to lighten the syntax # Entry dates for which the previous exit is higher than 1 day (including the first one) entr <- x[x$nbdays>1,"EntryDate"] # Exit dates just before cases where nbdays are > 1 and includes the last exit date. # We use unique to avoid picking 2 times the last exit whichexist <- unique(c(c(which(x$nbdays > 1)-1)[-1],nrow(x))) exit <- x[whichexist,"ExitDate"] d[[i]] <- data.frame( PersonalID = x[1,1], EntryDate = entr, ExitDate = exit ) } # paste the elements of this list into one data.frame do.call(rbind, d)

更多推荐

塑造相同DF中的区间日期和范围日期(Shaping Interval Dates and Range Dates in Same DF)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表