我有一个很大的csv文件,其中带有iso格式2015-04-01 10:26:41的时间戳数据.数据跨越数月,输入范围从相隔30秒到数小时不等.它的列是id,时间,速度.
I have a large csv file with time stamp data in the iso format 2015-04-01 10:26:41. The data span multiple months with entries ranging from 30 secs apart to multiple hours. It's columns are id, time, speed.
最终,我想按15分钟的时间间隔对数据进行分组,然后计算平均速度,但是在15分钟的时隙中有很多条目.
Ultimately I want to group data by a time interval of 15 mins, then calculate an average speed, for however many entries are in the 15 mins timeslot.
我正在尝试使用Pandas,因为它似乎具有可靠的时间序列工具,并且这样做可能很容易,但是我却遇到了第一个障碍.
I am trying to use Pandas because it seems like it has a solid time-series tools and it might be easy to do this, but I am falling at the first hurdle.
到目前为止,我已经将CSV导入为数据框,并且所有列的dtype为object.我已经按日期对数据进行了排序,现在正尝试按时间间隔对条目进行分组,这正是我在其中努力的地方.基于谷歌搜索,我尝试使用此代码df.resample('5min', how=sum) resample数据.在这里,我得到错误TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex.我正在考虑尝试groupby方法,也许像在df.groupby(lambda x:x.minutes + 5)中那样使用lambda,这会产生错误AttributeError: 'str' object has no attribute 'minutes'.
So far I have imported the CSV as a dataframe and, all columns have a dtype of object. I have sorted the data by date and am now trying to group the entries by a time interval which is where i'm struggling. Based around google searching, I have tried to resample the data using this code df.resample('5min', how=sum) Here I get the error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex. I was thinking about trying the groupbymethod, perhaps using lambda as in df.groupby(lambda x:x.minutes + 5) which produces the error AttributeError: 'str' object has no attribute 'minutes'.
基本上,我对a)熊猫是否具有其可以识别的格式的时间序列数据感到困惑,因为它是dtype是object,并且b)如果它可以识别它,我似乎就不知道了缩短时间间隔.
Basically I'm a little confused as to a) whether pandas has the time-series data in a format it's recognising as it's dtype is object, and b) if it can recognize it I can't seem to get the time-intervals down.
热衷于学习是否有人能指出我正确的方向.
Keen to learn if anyone could point me in the right direction.
DF看起来像这样
0 1 2 3 0 id boat_id time speed 1 386226 32 2015-01-15 05:14:32 4.2343243 2 386285 32 2015-01-15 05:44:57 3.45234推荐答案
首先,您似乎读了一个空白行.您可能要跳过文件pd.read_csv(filename, skiprows=1)中的第一行.
First, it looks like you read a blank row. You probably want to skip the first row in your file pd.read_csv(filename, skiprows=1).
您应该使用pd.to_datetime()将时间的文本表示形式转换为DatetimeIndex.
You should convert the text representation of the time into a DatetimeIndex using pd.to_datetime().
df.set_index(pd.to_datetime(df['time']), inplace=True)然后您应该可以重新采样.
You should then be able to resample.
df.resample('15min', how=np.mean)更多推荐
使用特定时间间隔将大 pandas 时间序列数据帧分组
发布评论