将一天中的日期指定为一个月(Assign day of the day year to a month)

编程入门 行业动态 更新时间:2024-10-28 20:29:35
将一天中的日期指定为一个月(Assign day of the day year to a month)

样本数据

df <- data.frame(ID1 = rep(1:1000, each= 5*365), year = rep(rep(2000:2004, each = 365), times = 1000), day = rep(1:365, times = 1000*5), x= runif(365*1000*5))

此数据包含列day ,即一年中的某一天。 我需要生成两列:

月份列:月份列(当天所属的月份)

Biweek专栏:每周一次的biweek属于哪一个。 一年有24个双周。 一个月<= 15的所有日子是第一个双周,而> 15是第二个双周。 例如

1月15日是Biweek 1, 1月16日至31日是双周2, 2月1日至15日是双周3和 2月16日至28日是双周4,依此类推。

为简单起见,我假设所有年份都是非闰年。

这是我的代码(在RS的帮助下),它创建了两列。

# create a vector of days for each month months <- list(1:31, 32:59, 60:90, 91:120, 121:151, 152:181, 182:212, 213:243, 244:273, 274:304, 305:334, 335:365) library(dplyr) ptm <- proc.time() df <- df %>% mutate(month = sapply(day, function(x) which(sapply(months, function(y) x %in% y))), # this assigns each day to a month date = as.Date(paste0(year,'-',format(strptime(paste0('1981-',day), '%Y-%j'), '%m-%d'))), # this creates a vector of dates for a non-leap year twowk = month*2 - (as.numeric(format(date, "%d")) <= 15)) %>% # this describes which biweek each day falls into dplyr::select(-date) proc.time() - ptm user system elapsed 121.71 0.31 122.43

我的问题是运行这个脚本所需的时间,我正在寻找一个相对更快的解决方案

编辑:要明确,我假设所有年份必须有365天。 在下面的一个答案中,对于2000年(闰年),2月有29天(2月的最后一天是60,但我希望最后一天是59),因此12月只有30天(12月开始时为336天)虽然它应该以335)开头。 我希望这很清楚。 我的解决方案解决了这个问题,但需要花费大量时间来运行

Sample data

df <- data.frame(ID1 = rep(1:1000, each= 5*365), year = rep(rep(2000:2004, each = 365), times = 1000), day = rep(1:365, times = 1000*5), x= runif(365*1000*5))

This data contains a column day which is the day of the year. I need to produce two columns:

Month column: a column of month (which month does the day belong)

Biweek column: which biweek does a day belong to. There are 24 biweek in a year. All days <= 15 in a month is the first biweek and > 15 is second biweek. For e.g.

15th Jan is Biweek 1, 16-31 Jan is biweek 2, 1-15 Feb is biweek 3 and 16-28 Feb is biweek 4 and so on.

For sake of simplicity, I am assuming all the years are non-leap years.

Here's the code I have (with help from RS as well) that creates the two columns.

# create a vector of days for each month months <- list(1:31, 32:59, 60:90, 91:120, 121:151, 152:181, 182:212, 213:243, 244:273, 274:304, 305:334, 335:365) library(dplyr) ptm <- proc.time() df <- df %>% mutate(month = sapply(day, function(x) which(sapply(months, function(y) x %in% y))), # this assigns each day to a month date = as.Date(paste0(year,'-',format(strptime(paste0('1981-',day), '%Y-%j'), '%m-%d'))), # this creates a vector of dates for a non-leap year twowk = month*2 - (as.numeric(format(date, "%d")) <= 15)) %>% # this describes which biweek each day falls into dplyr::select(-date) proc.time() - ptm user system elapsed 121.71 0.31 122.43

My issue is that of the time it takes to run this script and I am looking for a solution that is relatively faster

EDIT: To be clear, I have assumed all years must have 365 days. In one of the answers below, for the year 2000 (a leap year), Feb has 29 days (last day of Feb is 60 but I want the last day to be 59) and therefore Dec has only 30 days (Dec start with 336 though it should start with 335). I hope this is clear. My solution addresses this issue but takes lot of time to run.

最满意答案

以下是Frank在评论中提到的使用lubridate提取器和替换功能的解决方案。 关键的是mday() yday<- , mday()和month() ,它们分别设置日期的年份,获取日期的月份,并获取日期的月份。 8秒的运行时间对我来说似乎是可以接受的,尽管我确信一些优化可以减少这种情况,尽管可能会失去一般性。

另请注意使用case_when确保闰年2月29日之后正确的天数。

编辑:这是一个明显更快的解决方案。 您可以将DOY映射到一年的月和双周,然后将left_join到主表。 运行时间为0.36秒,因为您不再需要重复创建日期。 我们还绕过必须使用case_when ,因为case_when将处理丢失的日子。 根据要求,见2000年第59天是2月,第60天是3月。

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
tbl <- tibble(
  ID1 = rep(1:1000, each= 5*365),
  year = rep(rep(2000:2004, each = 365), times = 1000),
  day = rep(1:365, times = 1000*5),
  x= runif(365*1000*5)
)

tictoc::tic("")
doys <- tibble(
  day = rep(1:365),
  date = seq.Date(ymd("2001-1-1"), ymd("2001-12-31"), by = 1),
  month = month(date),
  biweek = case_when(
    mday(date) <= 15 ~ (month * 2) - 1,
    mday(date) > 15  ~ month * 2
  )
)
tbl_out2 <- left_join(tbl, select(doys, -date), by = "day")
tictoc::toc()
#> : 0.36 sec elapsed
tbl_out2
#> # A tibble: 1,825,000 x 6
#>      ID1  year   day     x month biweek
#>    <int> <int> <int> <dbl> <dbl>  <dbl>
#>  1     1  2000     1 0.331    1.     1.
#>  2     1  2000     2 0.284    1.     1.
#>  3     1  2000     3 0.627    1.     1.
#>  4     1  2000     4 0.762    1.     1.
#>  5     1  2000     5 0.460    1.     1.
#>  6     1  2000     6 0.500    1.     1.
#>  7     1  2000     7 0.340    1.     1.
#>  8     1  2000     8 0.952    1.     1.
#>  9     1  2000     9 0.663    1.     1.
#> 10     1  2000    10 0.385    1.     1.
#> # ... with 1,824,990 more rows
tbl_out2[55:65, ]
#> # A tibble: 11 x 6
#>      ID1  year   day     x month biweek
#>    <int> <int> <int> <dbl> <dbl>  <dbl>
#>  1     1  2000    55 0.127    2.     4.
#>  2     1  2000    56 0.779    2.     4.
#>  3     1  2000    57 0.625    2.     4.
#>  4     1  2000    58 0.245    2.     4.
#>  5     1  2000    59 0.640    2.     4.
#>  6     1  2000    60 0.423    3.     5.
#>  7     1  2000    61 0.439    3.     5.
#>  8     1  2000    62 0.105    3.     5.
#>  9     1  2000    63 0.218    3.     5.
#> 10     1  2000    64 0.668    3.     5.
#> 11     1  2000    65 0.589    3.     5.
 

由reprex包创建于2018-04-06 (v0.2.0)。

Here is a solution using lubridate extractors and replacement functions as mentioned by Frank in a comment. The key ones are yday<-, mday() and month(), which respectively set the day of year of a date, get the day of month of a date, and get the month of a date. 8 sec running time seems pretty acceptable to me, though I'm sure some optimising could shave that down though there might be a loss of generality.

Note also the use of case_when to ensure the correct numbering of days after Feb 29 on a leap year.

EDIT: Here is a significantly faster solution. You can just get the mapping of DOYs to months and biweeks for a single year, and then left_join to the main table. 0.36s running time, since you no longer have to repetitively create the date. We also bypass having to use case_when, since the join will take care of the missing days. See that Day 59 of year 2000 is February and Day 60 is March, as requested.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
tbl <- tibble(
  ID1 = rep(1:1000, each= 5*365),
  year = rep(rep(2000:2004, each = 365), times = 1000),
  day = rep(1:365, times = 1000*5),
  x= runif(365*1000*5)
)

tictoc::tic("")
doys <- tibble(
  day = rep(1:365),
  date = seq.Date(ymd("2001-1-1"), ymd("2001-12-31"), by = 1),
  month = month(date),
  biweek = case_when(
    mday(date) <= 15 ~ (month * 2) - 1,
    mday(date) > 15  ~ month * 2
  )
)
tbl_out2 <- left_join(tbl, select(doys, -date), by = "day")
tictoc::toc()
#> : 0.36 sec elapsed
tbl_out2
#> # A tibble: 1,825,000 x 6
#>      ID1  year   day     x month biweek
#>    <int> <int> <int> <dbl> <dbl>  <dbl>
#>  1     1  2000     1 0.331    1.     1.
#>  2     1  2000     2 0.284    1.     1.
#>  3     1  2000     3 0.627    1.     1.
#>  4     1  2000     4 0.762    1.     1.
#>  5     1  2000     5 0.460    1.     1.
#>  6     1  2000     6 0.500    1.     1.
#>  7     1  2000     7 0.340    1.     1.
#>  8     1  2000     8 0.952    1.     1.
#>  9     1  2000     9 0.663    1.     1.
#> 10     1  2000    10 0.385    1.     1.
#> # ... with 1,824,990 more rows
tbl_out2[55:65, ]
#> # A tibble: 11 x 6
#>      ID1  year   day     x month biweek
#>    <int> <int> <int> <dbl> <dbl>  <dbl>
#>  1     1  2000    55 0.127    2.     4.
#>  2     1  2000    56 0.779    2.     4.
#>  3     1  2000    57 0.625    2.     4.
#>  4     1  2000    58 0.245    2.     4.
#>  5     1  2000    59 0.640    2.     4.
#>  6     1  2000    60 0.423    3.     5.
#>  7     1  2000    61 0.439    3.     5.
#>  8     1  2000    62 0.105    3.     5.
#>  9     1  2000    63 0.218    3.     5.
#> 10     1  2000    64 0.668    3.     5.
#> 11     1  2000    65 0.589    3.     5.
 

Created on 2018-04-06 by the reprex package (v0.2.0).

更多推荐

本文发布于:2023-08-06 20:51:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1455455.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:一个月   日期   Assign   month   year

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!