创建一个滚动总和列,一旦达到阈值就会重置

编程入门 行业动态 更新时间:2024-10-09 09:18:39
本文介绍了创建一个滚动总和列,一旦达到阈值就会重置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

这个问题与我能找到的其他类似问题不同,因为我试图将回顾窗口和阈值合并为一个滚动总和.我实际上不确定我想要做的事情是否可以一步实现:

This question is unlike other similar ones that I could find because I am trying to combine a lookback window and a threshold into one rolling sum. I'm not actually sure what I'm trying to do is achievable in one step:

我有一个带有日期时间列和值列的熊猫数据框.我创建了一个列,它对滚动时间窗口内的值列 (V) 求和.但是,一旦达到某个阈值,我希望此滚动总和重置为 0.

I have a pandas dataframe with a datetime column and a value column. I have created a column that sums the value column (V) over a rolling time window. However I would like this rolling sum to reset to 0 once it reaches a certain threshold.

我不知道是否可以在一个列操作步骤中执行此操作,因为在总和的每个步骤中都有两个条件在起作用 - 回顾窗口和阈值.如果有人对这是否可行以及我如何实现它有任何想法,请告诉我.我知道如何迭代地执行此操作,但是速度非常慢(我的数据帧有 > 100 万个条目).

I don't know if it's possible to do this in one column manipulation step since there are two conditions at play at each step in the sum- the lookback window and the threshold. If anyone has any ideas about if this is possible and how I might be able to achieve it please let me know. I know how to do this iteratively however it is very very slow (my dataframe has >1 million entries).

示例:

回顾时间:3分钟

阈值:3

+---+-----------------------+-------+--------------------------+ | | myDate | V | rolling | desired_column | +---+-----------------------+-------+---------+----------------+ | 1 | 2020-04-01 10:00:00 | 0 | 0 | 0 | | 2 | 2020-04-01 10:01:00 | 1 | 1 | 1 | | 3 | 2020-04-01 10:02:00 | 2 | 3 | 3 | | 4 | 2020-04-01 10:03:00 | 1 | 4 | 1 | | 5 | 2020-04-01 10:04:00 | 0 | 4 | 1 | | 6 | 2020-04-01 10:05:00 | 4 | 7 | 5 | | 7 | 2020-04-01 10:06:00 | 1 | 6 | 1 | | 8 | 2020-04-01 10:07:00 | 1 | 6 | 2 | | 9 | 2020-04-01 10:08:00 | 0 | 6 | 0 | | 10| 2020-04-01 10:09:00 | 3 | 5 | 5 | +---+-----------------------+-------+---------+----------------+

在此示例中,总和决定总和不会考虑违反(或等于)阈值 3 的行上或之前的任何值.

In this example the sum rulling sum will not take into account any values on or before a row that breaches (or is equal to) the threshold of 3.

推荐答案

以下方法无论如何都不是内存有效的,但它应该比循环更快.它假定时间是连续的,以便委托给 numpy 方法,否则您可以在调用之前包含缺少的时间.

The following approach is not memory efficient by any means, but it should be faster than looping. It assumes time is continuous in order to delegate to numpy methods, otherwise you can include the missing times before calling.

def rolling_window(a, window): b = np.concatenate((np.zeros(window-1), a)) # only for 1d return np.array([b[..., i:i+window] for i in range(a.size)]) def dynamic_window(w: np.array, reset): regions = np.hstack([ np.zeros((w.shape[0], 1)), np.cumsum(w, axis=-1)[:, :-1] ]) // reset return w * (regions == regions[:, -1][:, np.newaxis])

将其用作

# sample df # please always provide a callable line of code # you could get it with `df.head(10).to_dict('split')` df = pd.DataFrame({ 'myDate': pd.date_range('2020-04-01 10:00', periods=10, freq='T'), 'V': [0, 1, 2, 1, 0, 4, 1, 1, 0, 3] }) # include all time increments df = pd.concat([ df, pd.DataFrame(pd.date_range(df['myDate'].min(), df['myDate'].max(), freq='T'), columns=['myDate']) ]).drop_duplicates(subset=['myDate']).fillna(0).sort_values('myDate') df['4min_sum'] = df.rolling('4min', on='myDate')['V'].sum() # use the functions df['desired_column'] = dynamic_window( rolling_window(df['V'].to_numpy(), 4), 3).sum(axis=-1)

输出

myDate V 4min_sum desired_column 0 2020-04-01 10:00:00 0.0 0.0 0.0 1 2020-04-01 10:01:00 1.0 1.0 1.0 2 2020-04-01 10:02:00 2.0 3.0 3.0 3 2020-04-01 10:03:00 1.0 4.0 1.0 4 2020-04-01 10:04:00 0.0 4.0 1.0 5 2020-04-01 10:05:00 4.0 7.0 4.0 6 2020-04-01 10:06:00 1.0 6.0 1.0 7 2020-04-01 10:07:00 1.0 6.0 2.0 8 2020-04-01 10:08:00 0.0 6.0 0.0 9 2020-04-01 10:09:00 3.0 5.0 5.0

注意它在 10:05 如何输出 4 而不是您在预期输出中的 5.根据你的逻辑,它应该是 4;该窗口包含 [2, 1, 0, 4] 并且,由于前两个数字总和为 3,因此窗口应重置并返回 0 + 4.

Notice how at 10:05 it outputs 4 instead of the 5 you have in your expected output. According to your logic it should be 4; that window contains [2, 1, 0, 4] and, since the two first numbers sum 3, the window should reset and return 0 + 4.

更多推荐

创建一个滚动总和列,一旦达到阈值就会重置

本文发布于:2023-10-16 14:11:09,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1497796.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:就会   阈值   总和   创建一个

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!