使用熊猫按月定制分组(Custom grouping by months with pandas)

我试图不定期地从大熊猫数据框中对每月现金流进行分组：例如，在指数为[4,9,12]的月末。我一直在使用pd.groupby（）和pd.resample（），但无法找出有效的语法。

import datetime as dt import pandas as pd import numpy as np index = pd.date_range(dt.datetime(2015, 1, 1), periods=12, freq='M') data = (10 * np.random.rand(12)).round() df = pd.DataFrame(data, index = index, columns = ['A']) print df

基本上，从列表[4,9,12]，我正在寻找一个语法，将1-4个月，5-9个和10-12个月的总和相加。例如，从下面的A列开始，我想得到列B或C，其中运行总和在这些给定月份结束：

A B C 2015-01-31 9 9 0 2015-02-28 7 16 0 2015-03-31 5 21 0 2015-04-30 4 25 25 2015-05-31 6 6 0 2015-06-30 8 14 0 2015-07-31 2 16 0 2015-08-31 7 23 0 2015-09-30 0 23 23 2015-10-31 5 5 0 2015-11-30 5 10 0 2015-12-31 8 18 18

感谢您的帮助！

吉恩

I am trying to group monthly cash flows from a pandas dataframe on an irregular basis: for instance, at the end of months whose index is in [4, 9, 12]. I have been messing around with pd.groupby() and pd.resample() but can't figure out a syntax that works.

import datetime as dt import pandas as pd import numpy as np index = pd.date_range(dt.datetime(2015, 1, 1), periods=12, freq='M') data = (10 * np.random.rand(12)).round() df = pd.DataFrame(data, index = index, columns = ['A']) print df

Basically, from a list [4, 9, 12], I am looking for a syntax that would sum together the months 1-4, 5-9 and 10-12. For instance, from the column A below, I would like to get either the column B or C with a running-sum ending on these given months:

A B C 2015-01-31 9 9 0 2015-02-28 7 16 0 2015-03-31 5 21 0 2015-04-30 4 25 25 2015-05-31 6 6 0 2015-06-30 8 14 0 2015-07-31 2 16 0 2015-08-31 7 23 0 2015-09-30 0 23 23 2015-10-31 5 5 0 2015-11-30 5 10 0 2015-12-31 8 18 18

Thank you for your help!

Jean

最满意答案

使用一些字符串格式来获取索引，但适用于任何月份组合（只要明确包含第一个月）。

它避免遍历完整的Pandas数据帧，仅在所需的几个月内循环。

year = 2015 months = [1, 4, 9, 12] for num, month in enumerate(months[:-1]): csum = df['{0}-{1}'.format(year, month):'{0}-{1}'.format( year,months[num+1]-1)].A.sum() df.loc['{0}-{1}'.format(year,months[num+1]), 'B'] = csum print(df) A B 2015-01-31 5 NaN 2015-02-28 1 NaN 2015-03-31 3 NaN 2015-04-30 7 9 2015-05-31 6 NaN 2015-06-30 9 NaN 2015-07-31 4 NaN 2015-08-31 4 NaN 2015-09-30 9 30 2015-10-31 3 NaN 2015-11-30 0 NaN 2015-12-31 9 12

Using some string formatting to get the index, but works for any combination of months (as long as the first month is explicitly included).

It avoids to loop over the full Pandas dataframe, looping over the required months only.

year = 2015 months = [1, 4, 9, 12] for num, month in enumerate(months[:-1]): csum = df['{0}-{1}'.format(year, month):'{0}-{1}'.format( year,months[num+1]-1)].A.sum() df.loc['{0}-{1}'.format(year,months[num+1]), 'B'] = csum print(df) A B 2015-01-31 5 NaN 2015-02-28 1 NaN 2015-03-31 3 NaN 2015-04-30 7 9 2015-05-31 6 NaN 2015-06-30 9 NaN 2015-07-31 4 NaN 2015-08-31 4 NaN 2015-09-30 9 30 2015-10-31 3 NaN 2015-11-30 0 NaN 2015-12-31 9 12

更多推荐

使用熊猫按月定制分组(Custom grouping by months with pandas)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表