趋势长度

编程入门 行业动态 更新时间:2024-10-24 04:29:32
本文介绍了趋势长度 - 面板数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个平衡良好的面板数据集,其中包含 NA 观察结果.我将使用 LOCF,并想知道每个面板中有多少连续的 NA,然后再进行观察.LOCF 是一个过程,其中可以使用最后一次观察结转"填充"缺失值.这对一些时间序列应用程序来说是有意义的;也许我们有 5 分钟增量的天气数据:对缺失观测值的良好猜测可能是 5 分钟前进行的观测.

I have a well balanced panel data set which contains NA observations. I will be using LOCF, and would like to know how many consecutive NA's are in each panel, before carrying observations forward. LOCF is a procedure where by missing values can be "filled in" using the "last observation carried forward". This can make sense it some time-series applications; perhaps we have weather data in 5 minute increments: a good guess at the value of a missing observation might be an observation made 5 minutes earlier.

显然,在一个面板中将观察结果提前一小时比在同一面板中将相同的观察结果提前到下一年更有意义.

Obviously, it makes more sense to carry an observation forward one hour within one panel than it does to carry that same observation forward to the next year in the same panel.

我知道您可以使用 zoo::na.locf 设置maxgap"参数,但是,我想更好地了解我的数据.请看一个简单的例子:

I am aware that you can set a "maxgap" argument using zoo::na.locf, however, I want to get a better feel for my data. Please see a simple example:

require(data.table) set.seed(12345) ### Create a "panel" data set data <- data.table(id = rep(1:10, each = 10), date = seq(as.POSIXct('2012-01-01'), as.POSIXct('2012-01-10'), by = '1 day'), x = runif(100)) ### Randomly assign NA's to our "x" variable na <- sample(1:100, size = 52) data[na, x := NA] ### Calculate the max number of consecutive NA's by group...this is what I want: ### ID Consecutive NA's # 1 1 # 2 3 # 3 3 # 4 3 # 5 4 # 6 5 # ... # 10 2 ### Count the total number of NA's by group...this is as far as I get: data[is.na(x), .N, by = id]

欢迎所有解决方案,但高度首选 data.table 解决方案;数据文件很大.

All solutions are welcomed, but data.table solutions are highly preferred; the data file is large.

推荐答案

这样就可以了:

data[, max(with(rle(is.na(x)), lengths[values])), by = id]

我刚刚运行 rle 来查找所有连续的 NA 并选择了最大长度.

I just ran rle to find all consecutive NA's and picked the max length.

对于恢复上述 max 的日期范围的评论问题,这是一个相当复杂的答案:

Here's a rather convoluted answer to the comment question of recovering the date ranges for the above max:

data[, { tmp = rle(is.na(x)); tmp$lengths[!tmp$values] = 0; # modify rle result to ignore non-NA's n = which.max(tmp$lengths); # find the index in rle of longest NA sequence tmp = rle(is.na(x)); # let's get back to the unmodified rle start = sum(tmp$lengths[0:(n-1)]) + 1; # and find the start and end indices end = sum(tmp$lengths[1:n]); list(date[start], date[end], max(tmp$lengths[tmp$values])) }, by = id]

更多推荐

趋势长度

本文发布于:2023-11-30 15:00:13,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1650422.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:长度   趋势

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!