如何分组多个列并聚合不同列上的差异?

编程入门行业动态更新时间:2024-10-25 10:21:30

本文介绍了如何分组多个列并聚合不同列上的差异?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在这里寻求有关如何在 Python/Panda 中执行此操作的帮助:

I am looking for help here on how to do this in Python / Panda:

我正在寻找原始数据(如下)，并通过具有多个 cols(州、县和日期)的组找到多个 cols(cnt_a 和 cnt_b)的每日差异.

I am looking to take the original data (below) and find the daily difference of multiple cols (cnt_a and cnt_b) by a group with multiple cols (state, county and date).

我一直在尝试不同的方法，但似乎无法通过检查重复项"来解决问题.问题

I've been trying it different ways, and I can't seem to get by the "check for duplicate" issue

dft_a = df.sort_values(['state','county','date']).groupby['state','county','date','cnt_a'].diff(-1)

尝试将其拆分以一次解决一件事:

Tried splitting it out to fix one thing at a time:

df1 = df.sort_values(['state','county','date']) df2 = df1.groupby(['state','county'])['cnt_a'].diff()

原始数据.=>df

date county state cnt_a cnt_b 2020-06-13 Bergen New Jersey 308 11 2020-06-14 Bergen New Jersey 308 11 2020-06-15 Bergen New Jersey 320 15 2020-06-12 Union New Jersey 100 3 2020-06-13 Union New Jersey 130 4 2020-06-14 Union New Jersey 150 5 2020-06-12 Bronx New York 200 100 2020-06-13 Bronx New York 210 200

想要的输出

date county state cnt_a cnt_b daydiff_a daydiff_b 2020-06-13 Bergen New Jersey 308 11 0 0 2020-06-14 Bergen New Jersey 308 11 0 0 2020-06-15 Bergen New Jersey 320 15 12 4 2020-06-12 Union New Jersey 100 3 0 0 2020-06-13 Union New Jersey 130 4 30 1 2020-06-14 Union New Jersey 150 5 20 1 2020-06-12 Bronx New York 200 100 0 0 2020-06-13 Bronx New York 210 200 10 100

推荐答案