本文介绍了如何分组多个列并聚合不同列上的差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在这里寻求有关如何在 Python/Panda 中执行此操作的帮助:
I am looking for help here on how to do this in Python / Panda:
我正在寻找原始数据(如下),并通过具有多个 cols(州、县和日期)的组找到多个 cols(cnt_a 和 cnt_b)的每日差异.
I am looking to take the original data (below) and find the daily difference of multiple cols (cnt_a and cnt_b) by a group with multiple cols (state, county and date).
我一直在尝试不同的方法,但似乎无法通过检查重复项"来解决问题.问题
I've been trying it different ways, and I can't seem to get by the "check for duplicate" issue
dft_a = df.sort_values(['state','county','date']).groupby['state','county','date','cnt_a'].diff(-1)尝试将其拆分以一次解决一件事:
Tried splitting it out to fix one thing at a time:
df1 = df.sort_values(['state','county','date']) df2 = df1.groupby(['state','county'])['cnt_a'].diff()原始数据.=>df
date county state cnt_a cnt_b 2020-06-13 Bergen New Jersey 308 11 2020-06-14 Bergen New Jersey 308 11 2020-06-15 Bergen New Jersey 320 15 2020-06-12 Union New Jersey 100 3 2020-06-13 Union New Jersey 130 4 2020-06-14 Union New Jersey 150 5 2020-06-12 Bronx New York 200 100 2020-06-13 Bronx New York 210 200想要的输出
date county state cnt_a cnt_b daydiff_a daydiff_b 2020-06-13 Bergen New Jersey 308 11 0 0 2020-06-14 Bergen New Jersey 308 11 0 0 2020-06-15 Bergen New Jersey 320 15 12 4 2020-06-12 Union New Jersey 100 3 0 0 2020-06-13 Union New Jersey 130 4 30 1 2020-06-14 Union New Jersey 150 5 20 1 2020-06-12 Bronx New York 200 100 0 0 2020-06-13 Bronx New York 210 200 10 100 推荐答案- 对df 进行排序很重要,因为df.groupby 将被排序.如果 df 没有先排序,.groupby 中的连接列将与 df 的顺序不同.
- 一定要df,按'state'、'country'和'date'的顺序code>,然而,.groupby 中的 'date' 列被忽略.
- It's important to sort df, because df.groupby will be sorted. If df isn't sorted first, the joined columns from .groupby will not be in the same order as df.
- Be certain to df, in order, by 'state', 'country', and 'date', however, the 'date' column is ignored in .groupby.
- 指定rsuffix,或使用.rename 更改列标题.
- Specify rsuffix, and or use .rename to change the column headers.
更多推荐
如何分组多个列并聚合不同列上的差异?
发布评论