按组从当前观察值找到行范围内的data.table列的最大值

编程入门 行业动态 更新时间:2024-10-10 21:22:51
本文介绍了按组从当前观察值找到行范围内的data.table列的最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

好的,这样的标题相当可口,但这是我解决的问题,我很好奇是否有人有更好的解决方案或可以将其进一步推广.

Ok so that title is quite a mouthful but here's the problem I solved and I was curious if anyone had a better solution or could generalize it further.

我有一个时间序列作为 data.table ,我感兴趣的是找出观察结果是否逆势而上",这样可以说前后的数据.IE.此观测值是否大于前后的观测值年份?

I have a time series as a data.table and I'm interested in finding out if an observation "bucks the trend" so to speak of the data before and after. I.e. Is this observation larger than the year of observations before and after ?

要做到这一点,我的想法是建立另一列,该列从上方或下方的行中获取最大值,然后仅检查一行是否等于该最大值.

To do this, my thought was to build in another column that grabs the max from the rows above or below and then just check if a row is equal to that max.

幸运的是,我的数据是有规律地排序的,这意味着每一行到相邻行的时间都是相同的.我使用这一事实来手动指定窗口大小,而不必检查每一行是否在感兴趣的时间范围内.

My data, luckily was regularly ordered, meaning that every row is the same distance of time from it's neighboring row. I use this fact to manually specify window size, rather than having to check if each row is within the time distance of interest.

####################### # Package Loading usePackage <- function(p) { if (!is.element(p, installed.packages()[,1])) install.packages(p, dep = TRUE) require(p, character.only = TRUE) } packages <- c("data.table","lubridate") for(package in packages) usePackage(package) rm(packages,usePackage) ####################### set.seed(1337) # creating a data.table mydt <- data.table(Name = c(rep("Roger",12),rep("Johnny",8),"Mark"), Date = c(seq(ymd('2010-06-15'),ymd('2015-12-15'), by = '6 month'), seq(ymd('2012-06-15'),ymd('2015-12-15'), by = '6 month'), ymd('2015-12-15'))) mydt[ , Value := c(rnorm(12,15,1),rnorm(8,30,2),rnorm(1,100,30))] setkey(mydt, Name, Date) # setting the number of rows up or down to check windowSize <- 2 # applying the windowing max function mydt[, windowMax := unlist(lapply(1:.N, function(x) max(.SD[Filter(function(y) y>0 & y <= .N, unique(abs(x+(-windowSize:windowSize)))), Value]))), by = Name] # checking if a value is the local max (by window) mydt[, isMaxValue := windowMax == Value] mydt

如您所见,开窗功能虽然杂乱无章,但却可以解决问题.我的问题是:您知道做同一件事的更简单,更简洁或更易读的方式吗?您是否知道如何对此进行泛化以考虑不规则的时间序列(即不是固定的窗口)?我无法让 zoo :: rollapply 来做我想做的事,但是我没有太多的经验(我无法解决由1行组成的小组导致该功能的问题坠毁).

As you can see, the windowing function is a mess but it does the trick. My question is: do you know a simpler, more succinct, or more readable way to do the same thing? Do you know how to generalize this to take irregular time series into account (i.e. not a fixed window)? I couldn't get the zoo::rollapply to do what I wanted but I don't have that much experience with it (I couldn't solve the problem of a group with 1 row causing the function to crash).

让我知道您的想法并谢谢!!

Let me know your thoughts and thank you!

推荐答案

这并没有真正解决时间窗口部分,但是如果您想要使用 zoo :: rollapply 的单线,您可以这样做:

This doesn't really address the time-window part, but if you want a one-liner with zoo::rollapply, you can do:

width <- 2 * windowSize + 1 # One central obs. and two on each side mydt[, isMaxValue2 := rollapply(Value, width, max, partial = TRUE) == Value, by=Name] identical(mydt$isMaxValue, mydt$isMaxValue2) # TRUE

我认为,这比您提出的解决方案要清晰得多.

It's somewhat more legible than your proposed solution, I think.

当窗口中的观察少于5个时, partial = TRUE 参数处理边界效应".

The partial = TRUE argument deals with the "border effects" when there are less than 5 observations in the window.

更多推荐

按组从当前观察值找到行范围内的data.table列的最大值

本文发布于:2023-11-29 08:55:24,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1645930.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:范围内   最大值   data   table

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!