我试图不定期地从大熊猫数据框中对每月现金流进行分组:例如,在指数为[4,9,12]的月末。 我一直在使用pd.groupby()和pd.resample(),但无法找出有效的语法。
import datetime as dt import pandas as pd import numpy as np index = pd.date_range(dt.datetime(2015, 1, 1), periods=12, freq='M') data = (10 * np.random.rand(12)).round() df = pd.DataFrame(data, index = index, columns = ['A']) print df基本上,从列表[4,9,12],我正在寻找一个语法,将1-4个月,5-9个和10-12个月的总和相加。 例如,从下面的A列开始,我想得到列B或C,其中运行总和在这些给定月份结束:
A B C 2015-01-31 9 9 0 2015-02-28 7 16 0 2015-03-31 5 21 0 2015-04-30 4 25 25 2015-05-31 6 6 0 2015-06-30 8 14 0 2015-07-31 2 16 0 2015-08-31 7 23 0 2015-09-30 0 23 23 2015-10-31 5 5 0 2015-11-30 5 10 0 2015-12-31 8 18 18感谢您的帮助!
吉恩
I am trying to group monthly cash flows from a pandas dataframe on an irregular basis: for instance, at the end of months whose index is in [4, 9, 12]. I have been messing around with pd.groupby() and pd.resample() but can't figure out a syntax that works.
import datetime as dt import pandas as pd import numpy as np index = pd.date_range(dt.datetime(2015, 1, 1), periods=12, freq='M') data = (10 * np.random.rand(12)).round() df = pd.DataFrame(data, index = index, columns = ['A']) print dfBasically, from a list [4, 9, 12], I am looking for a syntax that would sum together the months 1-4, 5-9 and 10-12. For instance, from the column A below, I would like to get either the column B or C with a running-sum ending on these given months:
A B C 2015-01-31 9 9 0 2015-02-28 7 16 0 2015-03-31 5 21 0 2015-04-30 4 25 25 2015-05-31 6 6 0 2015-06-30 8 14 0 2015-07-31 2 16 0 2015-08-31 7 23 0 2015-09-30 0 23 23 2015-10-31 5 5 0 2015-11-30 5 10 0 2015-12-31 8 18 18Thank you for your help!
Jean
最满意答案
使用一些字符串格式来获取索引,但适用于任何月份组合(只要明确包含第一个月)。
它避免遍历完整的Pandas数据帧,仅在所需的几个月内循环。
year = 2015 months = [1, 4, 9, 12] for num, month in enumerate(months[:-1]): csum = df['{0}-{1}'.format(year, month):'{0}-{1}'.format( year,months[num+1]-1)].A.sum() df.loc['{0}-{1}'.format(year,months[num+1]), 'B'] = csum print(df) A B 2015-01-31 5 NaN 2015-02-28 1 NaN 2015-03-31 3 NaN 2015-04-30 7 9 2015-05-31 6 NaN 2015-06-30 9 NaN 2015-07-31 4 NaN 2015-08-31 4 NaN 2015-09-30 9 30 2015-10-31 3 NaN 2015-11-30 0 NaN 2015-12-31 9 12Using some string formatting to get the index, but works for any combination of months (as long as the first month is explicitly included).
It avoids to loop over the full Pandas dataframe, looping over the required months only.
year = 2015 months = [1, 4, 9, 12] for num, month in enumerate(months[:-1]): csum = df['{0}-{1}'.format(year, month):'{0}-{1}'.format( year,months[num+1]-1)].A.sum() df.loc['{0}-{1}'.format(year,months[num+1]), 'B'] = csum print(df) A B 2015-01-31 5 NaN 2015-02-28 1 NaN 2015-03-31 3 NaN 2015-04-30 7 9 2015-05-31 6 NaN 2015-06-30 9 NaN 2015-07-31 4 NaN 2015-08-31 4 NaN 2015-09-30 9 30 2015-10-31 3 NaN 2015-11-30 0 NaN 2015-12-31 9 12更多推荐
发布评论