本文介绍了在滚动窗口中取第一个和最后一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用pandas,我想应用可用于resample() 但不适用于rolling() 的函数.
Using pandas, I would like to apply function available for resample() but not for rolling().
这有效:
df1 = df.resample(to_freq, closed='left', kind='period', ).agg(OrderedDict([('Open', 'first'), ('Close', 'last'), ]))这不会:
df2 = df.rolling(my_indexer).agg( OrderedDict([('Open', 'first'), ('Close', 'last') ])) >>> AttributeError: 'first' is not a valid function for 'Rolling' object df3 = df.rolling(my_indexer).agg( OrderedDict([ ('Close', 'last') ])) >>> AttributeError: 'last' is not a valid function for 'Rolling' object对于将滚动窗口的第一个和最后一个值保留在两个不同的列中,您有什么建议?
What would be your advice to keep first and last value of a rolling windows to be put into two different columns?
import pandas as pd from random import seed from random import randint from collections import OrderedDict # DataFrame ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h') seed(1) values = [randint(0,10) for ts in ts_1h] df = pd.DataFrame({'Values' : values}, index=ts_1h) # First & last work with resample resampled_first = df.resample('3H', closed='left', kind='period', ).agg(OrderedDict([('Values', 'first')])) resampled_last = df.resample('3H', closed='left', kind='period', ).agg(OrderedDict([('Values', 'last')])) # They don't with rolling rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'first')])) rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'last')]))感谢您的帮助!最好的,
Thanks for your help! Bests,
推荐答案你可以使用自己的函数获取滚动窗口中的第一个或最后一个元素
You can use own function to get first or last element in rolling window
rolling_first = df.rolling(3).agg(lambda rows: rows[0]) rolling_last = df.rolling(3).agg(lambda rows: rows[-1])示例
import pandas as pd from random import seed, randint # DataFrame ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h') seed(1) values = [randint(0, 10) for ts in ts_1h] df = pd.DataFrame({'Values' : values}, index=ts_1h) df['first'] = df['Values'].rolling(3).agg(lambda rows: rows[0]) df['last'] = df['Values'].rolling(3).agg(lambda rows: rows[-1]) print(df)结果
Values first last 2020-01-01 00:00:00+00:00 2 NaN NaN 2020-01-01 01:00:00+00:00 9 NaN NaN 2020-01-01 02:00:00+00:00 1 2.0 1.0 2020-01-01 03:00:00+00:00 4 9.0 4.0 2020-01-01 04:00:00+00:00 1 1.0 1.0 2020-01-01 05:00:00+00:00 7 4.0 7.0 2020-01-01 06:00:00+00:00 7 1.0 7.0 2020-01-01 07:00:00+00:00 7 7.0 7.0 2020-01-01 08:00:00+00:00 10 7.0 10.0 2020-01-01 09:00:00+00:00 6 7.0 6.0 2020-01-01 10:00:00+00:00 3 10.0 3.0 2020-01-01 11:00:00+00:00 1 6.0 1.0 2020-01-01 12:00:00+00:00 7 3.0 7.0 2020-01-01 13:00:00+00:00 0 1.0 0.0 2020-01-01 14:00:00+00:00 6 7.0 6.0 2020-01-01 15:00:00+00:00 6 0.0 6.0 2020-01-01 16:00:00+00:00 9 6.0 9.0 2020-01-01 17:00:00+00:00 0 6.0 0.0 2020-01-01 18:00:00+00:00 7 9.0 7.0 2020-01-01 19:00:00+00:00 4 0.0 4.0 2020-01-01 20:00:00+00:00 3 7.0 3.0 2020-01-01 21:00:00+00:00 9 4.0 9.0 2020-01-01 22:00:00+00:00 1 3.0 1.0 2020-01-01 23:00:00+00:00 5 9.0 5.0 2020-01-02 00:00:00+00:00 0 1.0 0.0使用字典你必须直接输入lambda,而不是字符串
Using dictionary you have to put directly lambda, not string
result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]}) print(result)和自己的函数一样——你必须输入它的名字,而不是带有名字的字符串
The same with own function - you have to put its name, not string with name
def first(rows): return rows[0] def last(rows): return rows[-1] result = df['Values'].rolling(3).agg({'first': first, 'last': last}) print(result)示例
import pandas as pd from random import seed, randint # DataFrame ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h') seed(1) values = [randint(0, 10) for ts in ts_1h] df = pd.DataFrame({'Values' : values}, index=ts_1h) result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]}) print(result) def first(rows): return rows[0] def mylast(rows): return rows[-1] result = df['Values'].rolling(3).agg({'first': first, 'last': last}) print(result)更多推荐
在滚动窗口中取第一个和最后一个值
发布评论