本文介绍了在执行计算时将单个数据帧行拆分为多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个类似于df1的df,我想在其中拆分行,以使HOURS列的间隔为4,如df2所示。我将如何解决此问题以及建议使用哪些软件包?
I have a df akin to df1 where I want to break out the rows so that the HOURS column is in intervals of 4, shown in df2. How would I approach this problem and what packages are recommended?
ID在给定的一天中可以有多个序列。例如,一个ID可以在给定的一天中列出2-3次,并被分配一个以上的单位和一个以上的CODE。
IDs can have more than one sequence on a given day. For example, an ID can be listed 2-3 times on a given day, being assigned more than one unit and and more than one CODE.
需要以下内容:
- 所有分类数据必须保持一致子行(例如,每个子行上的CODE保持不变)
- 如果余数小于4,则应在最后一行列出余数(例如df2 ;; B行)
- 如果子行在下一个日期开始或结束,则应相应地更新日期列(例如df2; E行)
df1(当前)
EMPLID TIME_RPTG_CD START_DATE_TIME END_DATE_TIME Hrs_Time_Worked <chr> <chr> <dttm> <dttm> <dbl> 1 X00007 REG 2014-07-03 16:00:00 2014-07-03 02:00:00 10.0df2(所需)
EMPLID TIME_RPTG_CD START_DATE_TIME END_DATE_TIME Hrs_Time_Worked <chr> <chr> <dttm> <dttm> <dbl> 1 X00007 REG 2014-07-03 16:00:00 2014-07-03 20:00:00 4.0 1 X00007 REG 2014-07-03 20:00:00 2014-07-04 24:00:00 4.0 1 X00007 REG 2014-07-04 24:00:00 2014-07-04 02:00:00 2.0推荐答案
library(tidyverse) library(lubridate) df1%>% group_by(Row)%>% mutate(S=paste(START_DATE,START_TIME), HOURS=list((n<-c(rep(4,HOURS%/%4),HOURS%%4))[n!=0]))%>% unnest()%>% mutate(E=dmy_hm(S)+hours(cumsum(HOURS)), S=E-hours(unlist(HOURS)), START_DATE=format(S,"%d-%b-%y"), END_DATE=format(E,"%d-%b-%y"), START_TIME=format(S,"%H:%M"), END_TIME=format(E,"%H:%M"),S=NULL,E=NULL) # A tibble: 6 x 9 # Groups: Row [3] Row ID UNIT CODE START_DATE END_DATE START_TIME END_TIME HOURS <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> 1 A 1 3ESD REG 06-Aug-14 06-Aug-14 01:00 05:00 4. 2 A 1 3ESD REG 06-Aug-14 06-Aug-14 05:00 07:00 2. 3 B 2 3E14E OE2 12-Aug-14 13-Aug-14 21:00 01:00 4. 4 C 3 3E5E REG 19-Aug-14 20-Aug-14 21:00 01:00 4. 5 C 3 3E5E REG 20-Aug-14 20-Aug-14 01:00 05:00 4. 6 C 3 3E5E REG 20-Aug-14 20-Aug-14 05:00 07:00 2.
更多推荐
在执行计算时将单个数据帧行拆分为多行
发布评论