重新聚合数据

编程入门行业动态更新时间:2024-10-24 02:02:45

本文介绍了重新聚合数据-从粗略到精细的时间分辨率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我想跟进@ r2evans回答的问题:R 中的插值:检索每小时值.我正在尝试将3小时数据重新汇总到每小时一次.如果我使用以下可重现的小型数据集("tair"):

I would like to follow-up on a question answered by @r2evans: Interpolation in R: retrieving hourly values. I am trying to re-aggregate 3-hr data into hourly. If I use the following small reproducible dataset ("tair"):

tair<-structure(list(Year = c(1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L), Month = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), DoY = c(1L,1L, 1L, 1L, 1L, 1L, 1L, 2L), Hour = c(3L, 6L, 9L, 12L, 15L, 18L, 21L, 0L), Kobb = c(3.032776, 3.076996, 3.314209, 1.760345, 1.473724,1.295837, 2.72229, 3.209503), DateTime = structure(c(662698800,662709600, 662720400, 662731200, 662742000, 662752800, 662763600, 662774400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,8L), class = "data.frame")

在以下代码中:

library(zoo) newdt <- seq.POSIXt(tair$DateTime[1], tail(tair$DateTime, n=1), by='1 hour');newdt tair_hourly<-data.frame(datetime=newdt, Kobb=approx(tair$DateTime, tair$Kobb, newdt)$y)

它可以完成预期的工作，即我成功将3小时数据插值到每小时一次.现在，这适用于诸如温度或辐射之类的变量.但是，对于诸如降水量(随机)之类的变量，我希望在3小时分辨率之后的每小时汇总数据中将变量保持恒定(也许除以3).我只需要每小时的数据，这就是所有这些的原因.

It does the expected job, i.e. I successfully interpolate 3-hr data into hourly. Now, this works for variables such as temperature or radiation. However, for variables such as precipitation (stochastic), I would like to keep the variable constant (and perhaps divide it by 3) across the hourly aggregated data from the 3-hr resolution. I simply need hourly data, that's why all this.

关于如何实现上述小代码的任何想法?

Any ideas on how I can implement the above described small code?

推荐答案

两个建议.

tair2_list <- lapply(seq_len(nrow(tair) - 1), function(ind) { times <- seq(tair$DateTime[ind], tair$DateTime[ind+1] - 1, by = "1 hour") data.frame( DateTime = times, NewKobb = rep(tair$Kobb[ind] / length(times), length(times)), # for reference only Kobb = c(tair$Kobb[1], rep(NA, length(times)-1)) ) }) tair2 <- do.call(rbind, tair2_list) tair2 # DateTime NewKobb Kobb # 1 1991-01-01 03:00:00 1.0109253 3.032776 # 2 1991-01-01 04:00:00 1.0109253 NA # 3 1991-01-01 05:00:00 1.0109253 NA # 4 1991-01-01 06:00:00 1.0256653 3.032776 # 5 1991-01-01 07:00:00 1.0256653 NA # 6 1991-01-01 08:00:00 1.0256653 NA # 7 1991-01-01 09:00:00 1.1047363 3.032776 # 8 1991-01-01 10:00:00 1.1047363 NA # 9 1991-01-01 11:00:00 1.1047363 NA # 10 1991-01-01 12:00:00 0.5867817 3.032776 # 11 1991-01-01 13:00:00 0.5867817 NA # 12 1991-01-01 14:00:00 0.5867817 NA # 13 1991-01-01 15:00:00 0.4912413 3.032776 # 14 1991-01-01 16:00:00 0.4912413 NA # 15 1991-01-01 17:00:00 0.4912413 NA # 16 1991-01-01 18:00:00 0.4319457 3.032776 # 17 1991-01-01 19:00:00 0.4319457 NA # 18 1991-01-01 20:00:00 0.4319457 NA # 19 1991-01-01 21:00:00 0.9074300 3.032776 # 20 1991-01-01 22:00:00 0.9074300 NA # 21 1991-01-01 23:00:00 0.9074300 NA

tair$DateTime[ind+1] - 1 是为了确保我们不会无意中保留新序列中的最后一个.

The tair$DateTime[ind+1] - 1 is to ensure we do not inadvertently retain the last one in the new sequence.

library(dplyr) library(purrr) library(tidyr) tair %>% mutate(DateTime2 = purrr::map2(DateTime, lead(DateTime - 1, default = last(DateTime)), ~ tibble(DateTime2 = seq(.x, .y, by = "1 hour"))) ) %>% unnest(DateTime2) %>% group_by(DateTime) %>% mutate(NewKobb = Kobb / n()) %>% ungroup() # # A tibble: 22 x 8 # Year Month DoY Hour Kobb DateTime DateTime2 NewKobb # <int> <int> <int> <int> <dbl> <dttm> <dttm> <dbl> # 1 1991 1 1 3 3.03 1991-01-01 03:00:00 1991-01-01 03:00:00 1.01 # 2 1991 1 1 3 3.03 1991-01-01 03:00:00 1991-01-01 04:00:00 1.01 # 3 1991 1 1 3 3.03 1991-01-01 03:00:00 1991-01-01 05:00:00 1.01 # 4 1991 1 1 6 3.08 1991-01-01 06:00:00 1991-01-01 06:00:00 1.03 # 5 1991 1 1 6 3.08 1991-01-01 06:00:00 1991-01-01 07:00:00 1.03 # 6 1991 1 1 6 3.08 1991-01-01 06:00:00 1991-01-01 08:00:00 1.03 # 7 1991 1 1 9 3.31 1991-01-01 09:00:00 1991-01-01 09:00:00 1.10 # 8 1991 1 1 9 3.31 1991-01-01 09:00:00 1991-01-01 10:00:00 1.10 # 9 1991 1 1 9 3.31 1991-01-01 09:00:00 1991-01-01 11:00:00 1.10 # 10 1991 1 1 12 1.76 1991-01-01 12:00:00 1991-01-01 12:00:00 0.587 # # ... with 12 more rows

(我觉得有更好的方法可以做到这一点...)

(I feel like there is a better way to do this...)

更多推荐

重新聚合数据

本文发布于:2023-11-30 15:00:29，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1650423.html