如何按一天中的时间细分 pandas 时间序列

编程入门行业动态更新时间:2024-10-24 10:24:09

本文介绍了如何按一天中的时间细分 pandas 时间序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在尝试将一个熊猫时间序列的子集划分为一天中的多个天.例如，我只想要12:00至13:00之间的时间.

I am trying to subset a pandas time series that spans multiple days by time of day. E.g., I only want times between 12:00 and 13:00.

我知道如何在特定日期执行此操作，例如

I know how to do this for a specific date, e.g.,

In [44]: type(test) Out[44]: pandas.core.frame.DataFrame In [23]: test Out[23]: col1 timestamp 2012-01-14 11:59:56+00:00 3 2012-01-14 11:59:57+00:00 3 2012-01-14 11:59:58+00:00 3 2012-01-14 11:59:59+00:00 3 2012-01-14 12:00:00+00:00 3 2012-01-14 12:00:01+00:00 3 2012-01-14 12:00:02+00:00 3 In [30]: test['2012-01-14 12:00:00' : '2012-01-14 13:00'] Out[30]: col1 timestamp 2012-01-14 12:00:00+00:00 3 2012-01-14 12:00:01+00:00 3 2012-01-14 12:00:02+00:00 3

但是我在任何一个日期都无法使用test.index.hour或test.index.indexer_between_time()来做这两个建议，它们都被建议作为对类似问题的答案.我尝试了以下方法:

But I have failed to do it for any date using test.index.hour or test.index.indexer_between_time() which were both suggested as answers to similar questions. I tried the following:

In [44]: type(test) Out[44]: pandas.core.frame.DataFrame In [34]: test[(test.index.hour >= 12) & (test.index.hour < 13)] Out[34]: Empty DataFrame Columns: [col1] Index: [] In [36]: import datetime as dt In [37]: test.index.indexer_between_time(dt.time(12),dt.time(13)) Out[37]: array([], dtype=int64)

对于第一种方法，我不知道test.index.hour或test.index.minute实际返回了什么:

For the first approach, I have no idea what test.index.hour or test.index.minute are actually returning:

In [41]: test.index Out[41]: <class 'pandas.tseries.index.DatetimeIndex'> [2012-01-14 11:59:56, ..., 2012-01-14 12:00:02] Length: 7, Freq: None, Timezone: tzlocal() In [42]: test.index.hour Out[42]: array([11, 23, 0, 0, 0, 0, 0], dtype=int32) In [43]: test.index.minute Out[43]: array([59, 50, 0, 0, 50, 50, 0], dtype=int32)

他们要返回什么?我该如何设置所需的子集?理想情况下，如何才能同时使用上述两种方法?

What are they returning? How can I do the desired subsetting? Ideally, how can I get both the two approaches above to work?

问题原来是索引无效，上面的Timezone: tzlocal()证明了这一点，因为不应将tzlocal()用作时区.当我将生成索引的方法更改为pd.to_datetime()时，根据接受的答案的最后一部分，一切都按预期进行.

The problem turned out to be the the index was invalid, which is evidenced by Timezone: tzlocal() above, as tzlocal() should not be allowed as timezone. When I changed my method of generating the index to pd.to_datetime(), according to the final part of the accepted answer, everything worked as expected.

推荐答案

假设索引是有效的熊猫时间戳记，则可以进行以下操作:

Assuming the index is a valid pandas timestamp, the following will work:

test.index.hour返回一个数组，其中包含数据框中每一行的小时数.例如:

test.index.hour returns an array containing the hours for each row in your dataframe. Ex:

df = pd.DataFrame(randn(100000,1),columns=['A'],index=pd.date_range('20130101',periods=100000,freq='T'))

df.index.year返回array([2013, 2013, 2013, ..., 2013, 2013, 2013])

要获取时间在12到1之间的所有行，请使用

To grab all rows where the time is between 12 and 1, use

df.between_time('12:00','13:00')

这将占用几天/年等的时间范围.如果索引不是有效的时间戳，请使用pd.to_datetime()

This will grab that timeframe over several days/years etc. If the index is not a valid timestamp, convert it to a valid timestamp using pd.to_datetime()

更多推荐

如何按一天中的时间细分 pandas 时间序列

本文发布于:2023-10-24 08:41:18，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1523469.html