下面是数据集的示例
2020-01-01 01:35 | 50 |
2020-01-01 01:41 | 49 |
2020-01-01 01:46 | 50 |
我希望检查连续15分钟的值"是否等于50.如果是,我想提取它发生的日期.让我举一个例子,我说连续15分钟.假设我要在5分钟(而不是15分钟)的连续时间内检查该值是否等于50.满足该条件的数据如下
I wish to check if the 'Value' was equal to 50 for continuous period of 15 mins. If yes, I want to extract the date for which it occurred. Let me give an example what I mean by continuous period of 15 mins. Assume that I want to check if the value is equal to 50 for a continuous period of 5 mins (instead of 15 mins). The data that would satisfy this condition would be as follows
2020-01-01 01:35 | 50 |
2020-01-01 01:36 | 50 |
2020-01-01 01:37 | 50 |
2020-01-01 01:38 | 50 |
2020-01-01 01:39 | 50 |
然后我想将日期 2020-01-01 提取到列表中,因为上述数据连续5分钟(或更长)等于50.
Then I want to extract the date2020-01-01 onto a list because the above data was equal to 50 for a continuous period of 5 mins (or more).
推荐答案我将代码发布5分钟,以便输出与您所需的输出匹配.将 300 更改为 900 15分钟.步骤:
I am posting code for 5 mins so that output matches your desired output. Change 300 to 900 for 15 mins. Steps:
将 df ['Date'] 转换为 datetime ,以便我们可以减去两个日期知道他们之间的时差.
Convert the df['Date'] to datetime so that we can subtract two dates to know the time difference between them.
按日期对 df 进行分组,并为每个分组对象调用 f .
Group the df by date and Call f for each group object.
在 f 中: max-continuous_range 给出了长度为50的最长段的长度.如果长度为5分钟或以上,则 f 返回True.如果 f 返回 True ,则在列表中追加日期.
In f: max-continuous_range gives the length of longest segment where value is 50. f return True if length is 5 mins or more. Append date in list if f returns True.
使用:
def f(g): mask = (g['Value'] == 50) max_continuous_range = (np.max(np.cumsum(g['Date'].where(mask).diff())) + timedelta(minutes = 1)) return max_continuous_range.seconds >= 300 df['Date'] = pd.to_datetime(df['Date']) groups = df.groupby(df['Date'].dt.date, as_index = False) final_list = [str(idx) for idx, g in groups if f(g)]输入:
Date Value 0 2020-01-01 01:35 40 1 2020-01-01 01:36 50 2 2020-01-01 01:37 50 3 2020-01-01 01:38 50 4 2020-01-01 01:39 50 5 2020-01-01 01:40 50 6 2020-01-01 01:41 40 7 2020-01-01 01:42 40输出:
>>> final_list ['2020-01-01']在f(g)内:
掩码:真,值是50.
0 False 1 True 2 True 3 True 4 True 5 True 6 False 7 Falsedf ['Date'].where(mask)将NaT放在mask不是True的地方.
df['Date'].where(mask) Puts NaT where mask is not True.
0 NaT 1 2020-01-01 01:36:00 2 2020-01-01 01:37:00 3 2020-01-01 01:38:00 4 2020-01-01 01:39:00 5 2020-01-01 01:40:00 6 NaT 7 NaT.diff 给出两个连续元素之间的区别.如果任何值为NaT,它将给出NaT. df ['Date'].where(mask).diff():
.diff gives difference between two consecuting elements. It will give NaT if any value is NaT. Result after df['Date'].where(mask).diff():
0 NaT 1 NaT 2 0 days 00:01:00 3 0 days 00:01:00 4 0 days 00:01:00 5 0 days 00:01:00 6 NaT 7 NaT现在,连续时间之间的累计差值总和将为我们提供经过的总时间.在 np.cumsum(...)之后:
Now cumulative sum of difference between consecutive times will give us the total time elapsed. After np.cumsum(...):
0 NaT 1 NaT 2 0 days 00:01:00 3 0 days 00:02:00 4 0 days 00:03:00 5 0 days 00:04:00 6 NaT 7 NaTnp.max 给了我们最长的长度.添加 1 分钟以处理边界条件
np.max gives us the longest length. 1 minute is added to take care of boundary condition
更多推荐
如何检查病情是否持续超过15分钟?
发布评论